How to add from file to dictionary? - python

Suppose I have file that looks like this
Channel 1
12:30-14:00 Children’s program
17:00-19:00 Afternoon News
8:00-9:00 Morning News
————————————————————————— Channel 2
19:30-21:00 National Geographic
14:00-15:30 Comedy movies
And so on for a finite number of Channels and programs .
I would like to read the file and create a dictionary and sort it with respect to channel and program that is being shown on a given time . So something like this
Channels={
Channel 1:{Childrens program : 19:30-21:00,Afternooon news : 17:00-18:00},
Channel 2 :{ National Geographic: 19:30-21:00,Batman:14:00-15:30}
}

Try regular expressions at each line you parse, and fill your dictionary depending on what type of line this is, like so:
d = {}
with open("f.txt") as f:
channel = None
for l in f.readlines():
# If channel line format is found
match_line = re.findall(r"Channel \d*", l)
if match_line:
channel = match_line[0]
d[channel] = {}
# If program format is found
match_program = re.findall(r"(\d{1,2}:\d{1,2}-\d{1,2}:\d{1,2}) (.*$)", l)
if match_program:
d[channel][match_program[0][1]] = match_program[0][0]
d equals to:
{
"Channel 1": {
"Children’s program": "12:30-14:00",
"Afternoon News": "17:00-19:00",
"Morning News": "8:00-9:00"
},
"Channel 2": {
"National Geographic": "19:30-21:00",
"Comedy movies": "14:00-15:30"
}
}

My approach: First, break the text into blocks, each block is separated by the dash lines. Next, convert each block in a single key, value of a bigger dictionary. Finally, putting them all together to form the result.
import json
import pathlib
import re
def parse_channel(text):
"""
The `text` is a block of text such as:
Channel 1
12:30-14:00 Children’s program
17:00-19:00 Afternoon News
8:00-9:00 Morning News
This function will return ("Channel 1", {...}) which are
the key and value of a bigger dictionary
"""
lines = (line.strip() for line in text.splitlines() if line)
key = next(lines)
value = {}
for line in lines:
time, name = line.split(" ", 1)
value[name] = time
return key, value
path = pathlib.Path(__file__).with_name("data.txt")
assert path.exists()
text = path.read_text(encoding="utf-8")
# Break text into blocks, separated by dash lines
blocks = re.split("-{2,}", text)
# Convert each block into key/value, then build the final dictionary
channels = dict(map(parse_channel, blocks))
print(json.dumps(channels, indent=4))

Related

Converting JSON to CSV using Python. How to remove certain text/characters if found, and how to better format the cell?

I apologise in advanced if i have not provided enough information,using wrong terminology or im not formatting my question correctly. This is my first time asking questions here.
This is the script for the python script: https://pastebin.com/WWViemwf
This is the script for the JSON file (contains the first 4 elements hydrogen, helium, lithium, beryllium): https://pastebin.com/fyiijpBG
As seen, I'm converting the file from ".json" to ".csv".
The JSON file sometimes contains fields that say "NotApplicable" or "Unknown". Or it will show me weird text that I'm not familiar with.
For example here:
"LiquidDensity": {
"data": "NotAvailable",
"tex_description": "\\text{liquid density}"
},
And here:
"MagneticMoment": {
"data": "Unknown",
"tex_description": "\\text{magnetic dipole moment}"
},
Here is the code ive made to convert from ".json" to ".csv":
#liquid density
liquid_density = element_data["LiquidDensity"]["data"]
if isinstance(liquid_density, dict):
liquid_density_value = liquid_density["value"]
liquid_density_unit = liquid_density["tex_unit"]
else:
liquid_density_value = liquid_density
liquid_density_unit = ""
However in the csv file it shows up like this.
I'm also trying to remove these characters that i'm seeing in the ".csv" file.
In the JSON file, this is how the data is viewed:
"AtomicMass": {
"data": {
"value": "4.002602",
"tex_unit": "\\text{u}"
},
"tex_description": "\\text{atomic mass}"
},
And this is how i coded to convert, using Python:
#atomic mass
atomic_mass = element_data["AtomicMass"]["data"]
if isinstance(atomic_mass, dict):
atomic_mass_value = atomic_mass["value"]
atomic_mass_unit = atomic_mass["tex_unit"]
else:
atomic_mass_value = atomic_mass
atomic_mass_unit = ""
What have i done wrong?
I've tried replacing:
#melting point
melting_point = element_data["MeltingPoint"]["data"]
if isinstance(melting_point, dict):
melting_point_value = melting_point["value"]
melting_point_unit = melting_point["tex_unit"]
else:
melting_point_value = melting_point
melting_point_value = ""
With:
#melting point
melting_point = element_data["MeltingPoint"]["data"]
if isinstance(melting_point, dict):
melting_point_value = melting_point["value"]
melting_point_unit = melting_point["tex_unit"]
elif melting_point == "NotApplicable" or melting_point == "Unknown":
melting_point_value = ""
melting_point_unit = ""
else:
melting_point_value = melting_point
melting_point_unit = ""
However that doesn't seem to work.
Your code is fine, what went wrong is at the writing, let me take out some part of it.
#I will only be using Liquid Density as example, so I won't be showing the others
headers = [..., "Liquid Density", ...]
#liquid_density data reading part
liquid_density = element_data["LiquidDensity"]["data"]
if isinstance(liquid_density, dict):
liquid_density_value = liquid_density["value"]
liquid_density_unit = liquid_density["tex_unit"]
else:
liquid_density_value = liquid_density
liquid_density_unit = ""
#your writing of the data into the csv
writer.writerow([..., liquid_density, ...])
You write liquid_density directly into your csv, that is why it shows the dictionary. If you want to write the value only, I believe you should change the value in write line to
writer.writerow([..., liquid_density_value, ...])

Storing into a list schedule from a .txt file

The first part below does what I want by separating it by week works how I want it, but when I try to store it in the schedule list that iterates over I,
Here is my code:
schedule_file = open("sch2019.txt", "r")
schedule_list = schedule_file.readlines()
for x in schedule_list:
This is the output of the code:
week 1 : { ('LAR','CP'), ('KCC','JJ'), ('NYG','DC'), ('BB','NYJ'), ('CIN','SS'), ('DB','OR'), ('WR','PE'), ('GBP','CB'), ('AF','MV'), ('PS','NEP'), ('HT','NOS'), ('IC','LAC'), ('TT','CLV'), ('SF','TBB'), ('DL','AC'), ('BR','MD') }
week 2 : { ('SS','PS'), ('BB','NYG'), ('DC','WR'), ('SF','CIN'), ('IC','TT'), ('JJ','HT'), ('PE','AF'), ('KCC','OR'), ('LAC','DL'), ('CB','DB'), ('AC','BR'),('NOS','LAR'), ('TBB','CP'), ('NEP','MD'), ('MV','GBP'),('CLV','NYJ') }
week 3 : {('NYJ','NEP'), ('PS','SF'), ('HT','LAC'),('NYG','TBB'), ('NOS','SS'), ('DL','PE'), ('OR','MV'), ('DB','GBP'),('LAR','CLV'), ('CB','WR'), ('CP','AC'), ('MD','DC'), ('BR','KCC'), ('CIN','BB'), ('AF','IC'), ('TT','JJ') }
week 4 : { ('TT','AF'), ('SS','AC'), ('CIN','PS'), ('NEP','BB'), ('CLV','BR'), ('OR','IC'),('TBB','LAR'), ('MV','CB'), ('CP','HT'), ('LAC','MD'), ('KCC','DL'), ('JJ','DB'), ('DC','NOS'), ('WR','NYG'), ('PE','GBP') }
I want to store each week into a list schedule, but I don't know how to approach it.
What I tried doing is
schedule_file = open("sch2019.txt", "r")
schedule_list = schedule_file.readlines()
schedule = []
for x in schedule_list:
for i in x:
schedule.append(i)
print (schedule)
But all it does is further separate it.
How would I separate the list into iterations for schedule?
what i want it to be like:
Schedule[1]← {(LAR,CP), (KCC,JJ), (NYG,DC), (BB,NYJ), (CIN,SS), (DB,OR), (WR,PE),(GBP,CB), (AF,MV), (PS,NEP), (HT,NOS), (IC,LAC), (TT,CLV), (SF,TBB), (DL,AC),(BR,MD)}
Schedule[2]←{(SS,PS), (BB,NYG), (DC,WR), (SF,CIN), (IC,TT), (JJ,HT), (PE,AF), (KCC,OR),(LAC,DL), (CB,DB), (AC,BR), (NOS,LAR), (TBB,CP), (NEP,MD), (MV,GBP), (CLV,NYJ)}
where each iteration of the schedule[i] contains the values from the corresponding week info from the text file
I think this is what you're after, working Repl.it here.
This will parse each line into a list of fixtures (assuming I've got your input text file formatted correctly). This is then stored in a Dict with the week number as a key.
import re
with open('schedule.txt') as f:
schedule = f.readlines()
def line_parser(line):
p = re.compile("'[A-Z]{2,3}','[A-Z]{2,3}'")
fixtures = []
for match in p.findall(line):
fixtures.append(match)
return fixtures
schedule_list = {}
for week, line in enumerate(schedule):
schedule_list[week] = line_parser(line)
# Print the 10th fixture in week 3:
print(schedule_list[3][9])
# Output: 'LAC','MD'
I suppose you could break this down further by splitting each fixture into a home/away string which is stored as a dictionary, but I'm not sure what output format you're expecting?
import re
with open('schedule.txt') as f:
schedule = f.readlines()
def line_parser(line):
p = re.compile("[A-Z]{2,3}','[A-Z]{2,3}")
fixtures = []
for match in p.findall(line):
sides = match.split("','")
fixtures.append({
"home": sides[0],
"away": sides[1]
})
return fixtures
schedule_list = {}
for week, line in enumerate(schedule):
schedule_list[week] = line_parser(line)
# Print the 10th fixture in week 3:
print(schedule_list[3][9])
# Output: {'home': 'LAC', 'away': 'MD'}

Determining a pattern of lines in Python

I'm new to Python and having trouble thinking about this problem Pythonically. I have a text file of SMS messages. There are multi-line statements I'd like to capture.
import fileinput
parsed = {}
for linenum, line in enumerate(fileinput.input()):
### Process the input data ###
try:
parsed[linenum] = line
except (KeyError, TypeError, ValueError):
value = None
###############################################
### Now have dict with value: "data" pairing ##
### for every text message in the archive #####
###############################################
for item in parsed:
sent_or_rcvd = parsed[item][:4]
if sent_or_rcvd != "rcvd" and sent_or_rcvd != "sent" and sent_or_rcvd != '--\n':
###########################################
### Know we have a second or third line ###
###########################################
But here's where I hit a wall. I'm not sure what's the best way to contain the strings I get here. I'd love some expert input. Using Python 2.7.3 but glad to move to 3.
Goal: have a human-readable file full of three-line quotes from these SMS.
Example text:
12425234123|2011-03-19 11:03:44|words words words words
12425234123|2011-03-19 11:04:27|words words words words
12425234123|2011-03-19 11:05:04|words words words words
12482904328|2011-03-19 11:13:31|words words words words
--
12482904328|2011-03-19 15:50:48|More bolder than flow
More cumbersome than pleasure;
Goodbye rocky dump
--
(Yes, before you ask, that's a haiku about poo. I'm trying to capture them from the last 5 years of texting my best friend.)
Ideally resulting in something like:
Haipu 3
2011-03-19
More bolder than flow
More cumbersome than pleasure;
Goodbye rocky dump
import time
data = """12425234123|2011-03-19 11:03:44|words words words words
12425234123|2011-03-19 11:04:27|words words words words
12425234123|2011-03-19 11:05:04|words words words words
12482904328|2011-03-19 11:13:31|words words words words
--
12482904328|2011-03-19 15:50:48|More bolder than flow
More cumbersome than pleasure;
Goodbye rocky dump """.splitlines()
def get_haikus(lines):
haiku = None
for line in lines:
try:
ID, timestamp, txt = line.split('|')
t = time.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
ID = int(ID)
if haiku and len(haiku[1]) ==3:
yield haiku
haiku = (timestamp, [txt])
except ValueError: # happens on error with split(), time or int conversion
haiku[1].append(line)
else:
yield haiku
# now get_haikus() returns tuple (timestamp, [lines])
for haiku in get_haikus(data):
timestamp, text = haiku
date = timestamp.split()[0]
text = '\n'.join(text)
print """{d}\n{txt}""".format(d=date, txt=text)
A good start might be something like the following. I'm reading data from a file named data2 but the read_messages generator will consume lines from any iterable.
#!/usr/bin/env python
def read_messages(file_input):
message = []
for line in file_input:
line = line.strip()
if line[:4].lower() in ('rcvd', 'sent', '--'):
if message:
yield message
message = []
else:
message.append(line)
if message:
yield message
with open('data2') as file_input:
for msg in read_messages(file_input):
print msg
This expects input to look something like the following:
sent
message sent away
it has multiple lines
--
rcvd
message received
rcvd
message sent away
it has multiple lines

Checking a text segment within brackets with python

I have a text file, which is strucutred as following:
segmentA {
content Aa
content Ab
content Ac
....
}
segmentB {
content Ba
content Bb
content Bc
......
}
segmentC {
content Ca
content Cb
content Cc
......
}
I know how to search certrain strings through the whole text file, but how can i define to search for a certain string whithin, like example, "segmentC". I need something like reg expression to tell the script??:
If text beginn with "segmentC {" perform a search of a certain string until the first "}" appears.
Someone an idea?
Thanks in advance!
Not a RegEx solution ...but would do the work!
def SearchStuff(lines,sstr):
i=0
while(lines[i]!='}'):
#Do stuffff .....for e.g.
if 'Ca' in lines[i]:
return lines[i]
i+=1
def main(search_str):
f=open('file.txt','r')
lines = f.readlines()
f.close()
for line in lines:
if search_str in line:
index = lines.index(line)
break
lines = lines[index+1:]
print SearchStuff(lines,search_str)
search_str = 'segmentC' #set this string accordingly
main(search_str)
Depending on the complexity you are looking for, you can range from a simple state machine with line based pattern searching to a full lexer.
Line based search
The below example makes the assumption that you are only looking for one segment and that segmentC { and the closing } are on one single line.
def parsesegment(fh):
# Yields all lines inside "segmentC"
state = "out"
for line in fh:
line = line.strip() # in case there are whitespaces around
if state == "out":
if line.startswith("segmentC {"):
state = "in"
break
elif state == "in":
if line.startswith("}"):
state = "out"
break
# Work on the specific lines here
yield line
with open(...) as fh:
for line in parsesegment(fh):
# do something
Simple Lexer
If you need more flexibility, you can design a simple lexer/parser couple. For example, the following code makes no assumption to the organisation of the syntax between lines. It also ignores unknown pattern, which a typical lexer do not (normally it should raise a syntax error):
import re
class ParseSegment:
# Dictionary of patterns per state
# Tuples are (token name, pattern, state change command)
_regexes = {
"out": [
("open", re.compile(r"segment(?P<segment>\w+)\s+\{"), "in")
],
"in": [
("close", re.compile(r"\}"), "out"),
# Here an example of what you could want to match
("content", re.compile(r"content\s+(?P<content>\w+)"), None)
]
}
def lex(self, source, initpos = 0):
pos = initpos
end = len(source)
state = "out"
while pos < end:
for token_name, reg, state_chng in self._regexes[state]:
# Try to get a match
match = reg.match(source, pos)
if match:
# Advance according to how much was matched
pos = match.end()
# yield a token if it has a name
if token_name is not None:
# Yield token name, the full matched part of source
# and the match grouped according to (?P<tag>) tags
yield (token_name, match.group(), match.groupdict())
# Switch state if requested
if state_chng is not None:
state = state_chng
break
else:
# No match, advance by one character
# This is particular to that lexer, usually no match means
# the input file has an error in the syntax and lexer should
# yield an exception
pos += 1
def parse(self, source, initpos = 0):
# This is an example of use of the lexer with a parser
# This converts the input file into a dictionary. Keys are segment
# names, and values are list of contents.
segments = {}
cur_segment = None
# Use lexer to get tokens from source
for token, fullmatch, groups in self.lex(source, initpos):
# On open, create the list of content in segments
if token == "open":
cur_segment = groups["segment"]
segments[cur_segment] = []
# On content, ensure we know the segment and add content to the
# list
elif token == "content":
if cur_segment is None:
raise RuntimeError("Content found outside a segment")
segments[cur_segment].append(groups["content"])
# On close, set the current segment to unknown
elif token == "close":
cur_segment = None
# ignore unknown tokens, we could raise an error instead
return segments
def main():
with open("...", "r") as fh:
data = fh.read()
lexer = ParseSegment()
segments = lexer.parse(data)
print(segments)
return 0
if __name__ == '__main__':
main()
Full Lexer
Then if you need even more flexibility and reuseability, you will have to create a full parser. No need to reinvent the wheel, have a look at this list of language parsing modules, you will probably find the one that suits you.

Parsing chat messages as config

I'm trying write a function that would be able to parse out a file with defined messages for a set of replies but am at loss on how to do so.
For example the config file would look:
[Message 1]
1: Hey
How are you?
2: Good, today is a good day.
3: What do you have planned?
Anything special?
4: I am busy working, so nothing in particular.
My calendar is full.
Each new line without a number preceding it is considered part of the reply, just another message in the conversation without waiting for a response.
Thanks
Edit: The config file will contain multiple messages and I would like to have the ability to randomly select from them all. Maybe store each reply from a conversation as a list, then the replies with extra messages can carry the newline then just split them by the newline. I'm not really sure what would be the best operation.
Update:
I've got for the most part this coded up so far:
def parseMessages(filename):
messages = {}
begin_message = lambda x: re.match(r'^(\d)\: (.+)', x)
with open(filename) as f:
for line in f:
m = re.match(r'^\[(.+)\]$', line)
if m:
index = m.group(1)
elif begin_message(line):
begin = begin_message(line).group(2)
else:
cont = line.strip()
else:
# ??
return messages
But now I am stuck on being able to store them into the dict the way I'd like..
How would I get this to store a dict like:
{'Message 1':
{'1': 'How are you?\nHow are you?',
'2': 'Good, today is a good day.',
'3': 'What do you have planned?\nAnything special?',
'4': 'I am busy working, so nothing in particular.\nMy calendar is full'
}
}
Or if anyone has a better idea, I'm open for suggestions.
Once again, thanks.
Update Two
Here is my final code:
import re
def parseMessages(filename):
all_messages = {}
num = None
begin_message = lambda x: re.match(r'^(\d)\: (.+)', x)
with open(filename) as f:
messages = {}
message = []
for line in f:
m = re.match(r'^\[(.+)\]$', line)
if m:
index = m.group(1)
elif begin_message(line):
if num:
messages.update({num: '\n'.join(message)})
all_messages.update({index: messages})
del message[:]
num = int(begin_message(line).group(1))
begin = begin_message(line).group(2)
message.append(begin)
else:
cont = line.strip()
if cont:
message.append(cont)
return all_messages
Doesn't sound too difficult. Almost-Python pseudocode:
for line in configFile:
strip comments from line
if line looks like a section separator:
section = matched section
elsif line looks like the beginning of a reply:
append line to replies[section]
else:
append line to last reply in replies[section][-1]
You may want to use the re module for the "looks like" operation. :)
If you have a relatively small number of strings, why not just supply them as string literals in a dict?
{'How are you?' : 'Good, today is a good day.'}

Categories

Resources