Related
This is the sample data in a file. I want to split each line in the file and add to a dataframe. In some cases they have more than 1 child. So whenever they have more than one child new set of column have to be added child2 Name and DOB
(P322) Rashmika Chadda 15/05/1995 – Rashmi C 12/02/2024
(P324) Shiva Bhupati 01/01/1994 – Vinitha B 04/08/2024
(P356) Karthikeyan chandrashekar 22/02/1991 – Kanishka P 10/03/2014
(P366) Kalyani Manoj 23/01/1975 - Vandana M 15/05/1995 - Chandana M 18/11/1998
This is the code I have tried but this splits only by taking "-" into consideration
with open("text.txt") as read_file:
file_contents = read_file.readlines()
content_list = []
temp = []
for each_line in file_contents:
temp = each_line.replace("–", " ").split()
content_list.append(temp)
print(content_list)
Current output:
[['(P322)', 'Rashmika', 'Chadda', '15/05/1995', 'Rashmi', 'Chadda', 'Teega', '12/02/2024'], ['(P324)', 'Shiva', 'Bhupati', '01/01/1994', 'Vinitha', 'B', 'Sahu', '04/08/2024'], ['(P356)', 'Karthikeyan', 'chandrashekar', '22/02/1991', 'Kanishka', 'P', '10/03/2014'], ['(P366)', 'Kalyani', 'Manoj', '23/01/1975', '-', 'Vandana', 'M', '15/05/1995', '-', 'Chandana', 'M', '18/11/1998']]
Final output should be like below
Code
Parent_Name
DOB
Child1_Name
DOB
Child2_Name
DOB
P322
Rashmika Chadda
15/05/1995
Rashmi C
12/02/2024
P324
Shiva Bhupati
01/01/1994
Vinitha B
04/08/2024
P356
Karthikeyan chandrashekar
22/02/1991
Kanishka P
10/03/2014
P366
Kalyani Manoj
23/01/1975
Vandana M
15/05/1995
Chandana M
18/11/1998
I'm not sure if you want it as a list or something else.
To get lists:
result = []
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
# reconstruct
for i, person in enumerate(t):
person = person.split(" ")
# print(person)
# remove code
if i==0:
res = [person.pop(0)]
res.extend([" ".join(person[:2]), person[2]])
result.append(res)
print(result)
Which would give the below output:
[['P322', 'Rashmika Chadda', '15/05/1995', 'Rashmi C', '12/02/2024'], ['P324', 'Shiva Bhupati', '01/01/1994', 'Vinitha B', '04/08/2024'], ['P356', 'Karthikeyan chandrashekar', '22/02/1991', 'Kanishka P', '10/03/2014'], ['P366', 'Kalyani Manoj', '23/01/1975', 'Vandana M', '15/05/1995', 'Chandana M', '18/11/1998']]
You can organise a bit more the data using dictionnary:
result = {}
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
for i, person in enumerate(t):
# split name
person = person.split(" ")
# remove code
if i==0:
code = person.pop(0)
if i==0:
result[code] = {"parent_name": " ".join(person[:2]), "parent_DOB": person[2], "children": [] }
else:
result[code]['children'].append({f"child{i}_name": " ".join(person[:2]), f"child{i}_DOB": person[2]})
print(result)
Which would give this output:
{'P322': {'children': [{'child1_DOB': '12/02/2024',
'child1_name': 'Rashmi C'}],
'parent_DOB': '15/05/1995',
'parent_name': 'Rashmika Chadda'},
'P324': {'children': [{'child1_DOB': '04/08/2024',
'child1_name': 'Vinitha B'}],
'parent_DOB': '01/01/1994',
'parent_name': 'Shiva Bhupati'},
'P356': {'children': [{'child1_DOB': '10/03/2014',
'child1_name': 'Kanishka P'}],
'parent_DOB': '22/02/1991',
'parent_name': 'Karthikeyan chandrashekar'},
'P366': {'children': [{'child1_DOB': '15/05/1995',
'child1_name': 'Vandana M'},
{'child2_DOB': '18/11/1998', 'child2_name': 'Chandana M'}],
'parent_DOB': '23/01/1975',
'parent_name': 'Kalyani Manoj'}}
In the end, to have an actual table, you would need to use pandas but that will require for you to fix the number of children max so that you can pad the empty cells.
I have some arrays with the same customer names on different orders, what I am trying to do and the following:
1 - Take the name of customers and compare to return when it is the same, solving the problem of random order;
2 - After obtaining this comparison, the output should be as follows:
The solution below resolves if I use the arrays manually, but when I enter through the database, I have an error as output
It works:
array1 = [['CLIENT1', '2', '3'],['CLIENT2', '3', '4'],['CLIENT3', '4', '5']]
array2 = [['CLIENT3', '2', '3'],['CLIENT2', '3', '4'],['CLIENT1', '4', '5']]
array3 = [['CLIENT2', '2', '3'],['CLIENT1', '3', '4'],['CLIENT3', '4', '5']]
The output should look like this: Customer Name, value contained in array1 for this customer name, value contained in array2 for this customer name and value contained in array3 for this customer name
SCRIPT
#!/usr/bin/python
# -*- coding: utf-8 -*-
import psycopg2
from datetime import datetime, date, time, timedelta
# create script head
print ('----------------------------------------------------------------------------')
print ('Initializing script: '+str(date.today()))
print ('----------------------------------------------------------------------------')
################################################################################
# Set connection to postgres
connpostgres = psycopg2.connect("host='192.168.0.245'"
" dbname='metrics'"
" user='postgres'"
" password=pass123")
cursorpost = connpostgres.cursor()
################################################################################
# Create arrays
################################################################################
cursorpost.execute(rz_collect)
rz_collect = cursorpost.fetchall()
array_rz_collect = []
for row in rz_collect:
array_rz_collect.append(row)
cursorpost.execute(sql_on_off)
sql_on_off = cursorpost.fetchall()
array_sql_on_off = []
for row in sql_on_off:
array_sql_on_off.append(row)
cursorpost.execute(sql_gaps_so)
sql_gaps_so = cursorpost.fetchall()
array_sql_gaps_so = []
for row in sql_gaps_so:
array_sql_gaps_so.append(row)
cursorpost.execute(sql_gaps_db)
sql_gaps_db = cursorpost.fetchall()
array_sql_gaps_db = []
for row in sql_gaps_db:
array_sql_gaps_db.append(row)
cursorpost.execute(sql_gaps_sap)
sql_gaps_sap = cursorpost.fetchall()
array_sql_gaps_sap = []
for row in sql_gaps_sap:
array_sql_gaps_sap.append(row)
################################################################################
# Check and align arrays
# Initialize a dictionary with key = client name, value = list of client entries
result = {}
# Add values from array1
for client_info in array_rz_collect:
# Parse current entry
client_name = client_info[0]
client_values = client_info[1:]
print(client_values)
# Add previous values if exitant
if client_name in result.keys():
client_values.extend(result[client_name])
# Update clients dictionary
result[client_name] = client_values
# Add values from array2
for client_info in array_sql_on_off:
# Parse current entry
client_name = client_info[0]
client_values = client_info[1:]
# Add previous values if exitant
if client_name in result.keys():
client_values.extend(result[client_name])
# Update clients dictionary
result[client_name] = client_values
# Add values from array3
for client_info in array_sql_gaps_so:
# Parse current entry
client_name = client_info[0]
client_values = client_info[1:]
# Add previous values if exitant
if client_name in result.keys():
client_values.extend(result[client_name])
# Update clients dictionary
result[client_name] = client_values
# Print result information
for client_name, client_values in result.items():
print("Result: " + str(client_name) + ", " + str(client_values))
OUTPUT
File "SCRIPT.py", line 166, in <module>
client_values.extend(result[client_name])
AttributeError: 'tuple' object has no attribute 'extend'
DESERIED OUTPUT
Result: CLIENT1, ['3', '4', '4', '5', '2', '3']
Result: CLIENT2, ['2', '3', '3', '4', '3', '4']
Result: CLIENT3, ['4', '5', '2', '3', '4', '5']
The output of cursorpost.fetchall() (rz_collect) is a list of tuples.
In the code,
rz_collect = cursorpost.fetchall()
array_rz_collect = []
for row in rz_collect:
array_rz_collect.append(row)
array_rz_collect is same as rz_collect, therefore it too is a list of tuples.
Side note: That for loop is unnecessary, you can directly operate on rz_collect
In the code,
for client_info in array_rz_collect:
# Parse current entry
client_name = client_info[0]
client_values = client_info[1:]
print(client_values)
# Add previous values if exitant
if client_name in result.keys():
client_values.extend(result[client_name])
client_info is a tuple.
client_values is also a tuple since you are just slicing another tuple. Tuples do not have the extend method since they are not mutable like lists.
A simple fix to your problem is to convert the tuple to list.
client_info = list(client_info) # new line
client_name = client_info[0]
client_values = client_info[1:]
I am bit stuck in reading a file block-wise, and facing difficulty in getting some selective data in each block :
Here is my file content :
DATA.txt
#-----FILE-----STARTS-----HERE--#
#--COMMENTS CAN BE ADDED HERE--#
BLOCK IMPULSE DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=1021055:lr=1: \
USERID=ID=291821 NO_USERS=3 GROUP=ONE id_info=1021055 \
CREATION_DATE=27-JUNE-2013 SN=1021055 KEY ="22WS \
DE34 43RE ED54 GT65 HY67 AQ12 ES23 54CD 87BG 98VC \
4325 BG56"
BLOCK PASSION DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=324356:lr=1: \
USERID=ID=291821 NO_USERS=1 GROUP=ONE id_info=324356 \
CREATION_DATE=27-MAY-2012 SN=324356 KEY ="22WS \
DE34 43RE 342E WSEW T54R HY67 TFRT 4ER4 WE23 XS21 \
CD32 12QW"
BLOCK VICTOR DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=324356:lr=1: \
USERID=ID=291821 NO_USERS=5 GROUP=ONE id_info=324356 \
CREATION_DATE=27-MAY-2012 SN=324356 KEY ="22WS \
DE34 43RE 342E WSEW T54R HY67 TFRT 4ER4 WE23 XS21 \
CD32 12QW"
#--BLOCK--ENDS--HERE#
#--NEW--BLOCKS--CAN--BE--APPENDED--HERE--#
I am only interested in Block Name , NO_USERS, and id_info of each block .
these three data to be saved to a data-structure(lets say dict), which is further stored in a list :
[{Name: IMPULSE ,NO_USER=3,id_info=1021055},{Name: PASSION ,NO_USER=1,id_info=324356}. . . ]
any other data structure which can hold the info would also be fine.
So far i have tried getting the block names by reading line by line :
fOpen = open('DATA.txt')
unique =[]
for row in fOpen:
if "BLOCK" in row:
unique.append(row.split()[1])
print unique
i am thinking of regular expression approach, but i have no idea where to start with.
Any help would be appreciate.Meanwhile i am also trying , will update if i get something . Please help .
You could use groupy to find each block, use a regex to extract the info and put the values in dicts:
from itertools import groupby
import re
with open("test.txt") as f:
data = []
# find NO_USERS= 1+ digits or id_info= 1_ digits
r = re.compile("NO_USERS=\d+|id_info=\d+")
grps = groupby(f,key=lambda x:x.strip().startswith("BLOCK"))
for k,v in grps:
# if k is True we have a block line
if k:
# get name after BLOCK
name = next(v).split(None,2)[1]
# get lines after BLOCK and get the second of those
t = next(grps)[1]
# we want two lines after BLOCK
_, l = next(t), next(t)
d = dict(s.split("=") for s in r.findall(l))
# add name to dict
d["Name"] = name
# add sict to data list
data.append(d)
print(data)
Output:
[{'NO_USERS': '3', 'id_info': '1021055', 'Name': 'IMPULSE'},
{'NO_USERS': '1', 'id_info': '324356', 'Name': 'PASSION'},
{'NO_USERS': '5', 'id_info': '324356', 'Name': 'VICTOR'}]
Or without groupby as your file follows a format we just need to extract the second line after the BLOCK line:
with open("test.txt") as f:
data = []
r = re.compile("NO_USERS=\d+|id_info=\d+")
for line in f:
# if True we have a new block
if line.startswith("BLOCK"):
# call next twice to get thw second line after BLOCK
_, l = next(f), next(f)
# get name after BLOCK
name = line.split(None,2)[1]
# find our substrings from l
d = dict(s.split("=") for s in r.findall(l))
d["Name"] = name
data.append(d)
print(data)
Output:
[{'NO_USERS': '3', 'id_info': '1021055', 'Name': 'IMPULSE'},
{'NO_USERS': '1', 'id_info': '324356', 'Name': 'PASSION'},
{'NO_USERS': '5', 'id_info': '324356', 'Name': 'VICTOR'}]
To extract values you can iterate:
for dct in data:
print(dct["NO_USERS"])
Output:
3
1
5
If you want a dict of dicts and to access each section from 1-n you can store as nested dicts using from 1-n as tke key:
from itertools import count
import re
with open("test.txt") as f:
data, cn = {}, count(1)
r = re.compile("NO_USERS=\d+|id_info=\d+")
for line in f:
if line.startswith("BLOCK"):
_, l = next(f), next(f)
name = line.split(None,2)[1]
d = dict(s.split("=") for s in r.findall(l))
d["Name"] = name
data[next(cn)] = d
data["num_blocks"] = next(cn) - 1
Output:
from pprint import pprint as pp
pp(data)
{1: {'NO_USERS': '3', 'Name': 'IMPULSE', 'id_info': '1021055'},
2: {'NO_USERS': '1', 'Name': 'PASSION', 'id_info': '324356'},
3: {'NO_USERS': '5', 'Name': 'VICTOR', 'id_info': '324356'},
'num_blocks': 3}
'num_blocks' will tell you exactly how many blocks you extracted.
I'm having trouble accessing some values in a dictionary I made. In my code, I made two different dictionaries while reading through a file. The code I have is this:
nonterminal_rules = defaultdict(list)
terminal_rules = defaultdict(list)
for line in open(file, 'r').readlines():
LHS,RHS = line.strip().split("->")
if RHS[1] == "'" and RHS[-1] == "'" :
terminal_rules[LHS].append(RHS.strip())
else:
nonterminal_rules[LHS].append(RHS.split())
for i in nonterminal_rules:
for j in nonterminal_rules[i]:
if len(j) == 1:
x = terminal_rules[j[0]])
Here are the keys and values to my dict:
print(self.original_grammar.terminal_rules.items())
dict_items([('NN ', ["'body'", "'case'", "'immunity'", "'malaria'", "'mouse'", "'pathogen'", "'research'", "'researcher'", "'response'", "'sepsis'", "'system'", "'type'", "'vaccine'"]), ('NNS ', ["'cells'", "'fragments'", "'humans'", "'infections'", "'mice'", "'Scientists'"]), ('Prep ', ["'In'", "'with'", "'in'", "'of'", "'by'"]), ('IN ', ["'that'"]), ('Adv ', ["'today'", "'online'"]), ('PRP ', ["'this'", "'them'", "'They'"]), ('Det ', ["'a'", "'A'", "'the'", "'The'"]), ('RP ', ["'down'"]), ('AuxZ ', ["'is'", "'was'"]), ('VBN ', ["'alerted'", "'compromised'", "'made'"]), ('Adj ', ["'dendritic'", "'immune'", "'infected'", "'new'", "'Systemic'", "'weak'", "'whole'", "'live'"]), ('VBN ', ["'discovered'"]), ('Aux ', ["'have'"]), ('VBD ', ["'alerted'", "'injected'", "'published'", "'rescued'", "'restored'", "'was'"]), ('COM ', ["','"]), ('PUNC ', ["'?'", "'.'"]), ('PossPro ', ["'their'", "'Their'"]), ('MD ', ["'Will'"]), ('Conj ', ["'and'"]), ('VBP ', ["'alert'", "'capture'", "'display'", "'have'", "'overstimulate'"]), ('VB ', ["'work'"]), ('VBZ ', ["'invades'", "'is'", "'shuts'"]), ('NNP ', ["'Dr'", "'Jose'", "'Villadangos'"])])
Let's say I have the key-value pair {Aux:["have"]}.
The problem is, if i = Aux, for example, x is just set as an empty list, when I actually want to be equal to ["have"].
I'm not sure what I'm doing/accessing incorrectly. Any ideas? Thanks!
I'm assuming from reading your code that you want all things that start and end with ', correct? In that case, you probably want
if RHS[0] == "'" and RHS[-1] == "'" :
terminal_rules[LHS].append(RHS.strip())
Since 0 is the first character of the string :). If ' isn't the second character of the split string, then right now it'll add everything to non_terminal_rules.
If you're trying to set terminal_rules to be every key:value pair in nonterminal_rules that is of length 1, do this:
nonterminal_rules = defaultdict(list)
terminal_rules = defaultdict(list)
for line in open(file, 'r').readlines():
# Do stuff here as you've done above
terminal_rules = {key:value for key,value in nonterminal_rules.items() if len(value) == 1}
I am new to python and need help. I am trying to make a list of comma separated values.
I have this data.
EasternMountain 84,844 39,754 24,509 286 16,571 3,409 315
EasternHill 346,373 166,917 86,493 1,573 66,123 23,924 1,343
EasternTerai 799,526 576,181 206,807 2,715 6,636 1,973 5,214
CentralMountain 122,034 103,137 13,047 8 2,819 2,462 561
Now how do I get something like this;
"EasternMountain": 84844,
"EasternHill":346373,
and so on??
So far I have been able to do this:
fileHandle = open("testData", "r")
data = fileHandle.readlines()
fileHandle.close()
dataDict = {}
for i in data:
temp = i.split(" ")
dataDict[temp[0]]=temp[1]
with_comma='"'+temp[0]+'"'+':'+temp[1]+','
print with_comma
Use the csv module
import csv
with open('k.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
my_dict = {}
for row in reader:
my_dict[row[0]] = [''.join(e.split(',')) for e in row[1:]]
print my_dict
k.csv is a text file containing:
EasternMountain 84,844 39,754 24,509 286 16,571 3,409 315
EasternHill 346,373 166,917 86,493 1,573 66,123 23,924 1,343
EasternTerai 799,526 576,181 206,807 2,715 6,636 1,973 5,214
CentralMountain 122,034 103,137 13,047 8 2,819 2,462 561
Output:
{'EasternHill': ['346373', '166917', '86493', '1573', '66123', '23924', '1343', ''], 'EasternTerai': ['799526', '576181', '206807', '2715', '6636', '1973', '5214', ''], 'CentralMountain': ['122034', '103137', '13047', '8', '2819', '2462', '561', ''], 'EasternMountain': ['84844', '39754', '24509', '286', '16571', '3409', '315', '']}
Try this:
def parser(file_path):
d = {}
with open(file_path) as f:
for line in f:
if not line:
continue
parts = line.split()
d[parts[0]] = [part.replace(',', '') for part in parts[1:]]
return d
Running it:
result = parser("testData")
for key, value in result.items():
print key, ':', value
Result:
EasternHill : ['346373', '166917', '86493', '1573', '66123', '23924', '1343']
EasternTerai : ['799526', '576181', '206807', '2715', '6636', '1973', '5214']
CentralMountain : ['122034', '103137', '13047', '8', '2819', '2462', '561']
EasternMountain : ['84844', '39754', '24509', '286', '16571', '3409', '315']