Nested lists: Append string in list to list before - python

as an exercise I want to try analyzing a Whatsapp chat of mine. I opened the .txt file, used reader() and list() on it and removed the blank lines/lists. The remaining lists have the following format: chat = [[01.01.2019, 12:00 - name1: message1][message2] … ]
I would like to take the lists that only contain messages (not date, time and name) and merge them with the list that came just before it.
This is how it should look like in the end:
chat = [[01.01.2019, 12:00 - name1: message1 message2] … ]
I tried the following loops where if the list begins not with a number, the content will be stored inside a variable, but none of them is appended and when the loop is done, the variable has the last instance of a message only list stored inside.
for row in chat: # add to row before if no date in line
row = list(row)
without = ""
if row[0].isalpha():
without = row[0]
else:
row.append(without)
Thanks in advance :)

Take a complicated task, and break it up into different easy tasks.
This is an example of a generator that reads from a multi-line source, and outputs the actual lines you want, with some formatting to handle newlines.
# this is the condition from your code
def is_new_line(line):
tokens = list(line)
if tokens and not tokens[0].isalpha():
return True
return False
# this is a generator that takes multiline chats and outputs full rows without newlines
def line_generator(chat):
row = []
for line in chat:
if is_new_line(line):
if (row):
yield ' '.join(row)
row = [line.rstrip()]
else:
row.append(line.rstrip())
if (row):
yield ' '.join(row)
# sample data
chat = ['1 one\n', 'two\n', 'three\n', '2 one\n', 'two\n', 'three\n']
# the generator just outputs the rows as you want them defined
for row in line_generator(chat):
print(row)
1 one two three
2 one two three

Related

How to combine list items into a dictionary where some list items have the same key?

This is the file that I am working with called file1.txt
20
Gunsmoke
30
The Simpsons
10
Will & Grace
14
Dallas
20
Law & Order
12
Murder, She Wrote
And here is my code so far:
file = open('file1.txt')
lines = file.readlines()
print(lines)
new_list=[]
for i in lines:
new = i.strip()
new_list.append(new)
print(new_list)
new_dict = {}
for i in range(0,len(new_list),2):
new_dict[new_list[i]]=new_list[i+1]
if i in new_dict:
i[key] = i.values()
new_dict = dict(sorted(new_dict.items()))
print(new_dict)
file_2 = open('output_keys.txt', 'w')
for x, y in new_dict.items():
print(x, y)
file_2.write(x + ': ')
file_2.write(y)
file_2.write('\n')
file_2.close()
file_3 = open('output_titles.txt', 'w')
new_list2 = []
for x, y in new_dict.items():
new_list2.append(y)
new_list2.sort()
print(new_list2)
print(new_list2)
for i in new_list2:
file_3.write(i)
file_3.write('\n')
print(i)
file_3.close()
The instructions state:
Write a program that first reads in the name of an input file and then reads the input file using the file.readlines() method. The input file contains an unsorted list of number of seasons followed by the corresponding TV show. Your program should put the contents of the input file into a dictionary where the number of seasons are the keys, and a list of TV shows are the values (since multiple shows could have the same number of seasons).
Sort the dictionary by key (least to greatest) and output the results to a file named output_keys.txt. Separate multiple TV shows associated with the same key with a semicolon (;), ordering by appearance in the input file. Next, sort the dictionary by values (alphabetical order), and output the results to a file named output_titles.txt.
So the part I am having trouble with 2 parts:
First is "Separate multiple TV shows associated with the same key with a semicolon (;)".
What I have written so far just replaces the new item in the dictionary.
for i in range(0,len(new_list),2):
new_dict[new_list[i]]=new_list[i+1]
if i in new_dict:
i[key] = i.values()
The 2nd part is that in the Zybooks program it seems to add onto output_keys.txt and output_title.txt every time it iterates. But my code does not seem to add to output_keys and output_title. For example, if after I run file1.txt I then try to run file2.txt, it replaces output_keys and output_title instead of adding to it.
Try to break down the problem into smaller sub-problems. Right now, it seems like you're trying to solve everything at once. E.g., I'd suggest you omit the file input and output and focus on the basic functionality of the program. Once that is set, you can go for the I/O.
You first need to create a dictionary with numbers of seasons as keys and a list of tv shows as values. You almost got it; here's a working snippet (I renamed some of your variables: it's always a good idea to have meaningful variable names):
lines = file.readlines()
# formerly "new_list"
clean_lines = []
for line in lines:
line = line.strip()
clean_lines.append(line)
# formerly "new_dict"
seasons = {}
for i in range(0, len(clean_lines), 2):
season_num = int(clean_lines[i])
series = clean_lines[i+1]
# there are only two options: either
# the season_num is already in the dict...
if season_num in seasons:
# append to the existing entry
seasons[season_num].append(series)
# ...or it isn't
else:
# make a new entry with a list containing
# the series
seasons[season_num] = [series]
Here's how you can print the resulting dictionary with the tv shows separated by semicolon using join. Adapt to your needs:
for season_num, series in seasons.items():
print(season_num, '; '.join(series))
Output:
20 Gunsmoke; Law & Order
30 The Simpsons
10 Will & Grace
14 Dallas
12 Murder, She Wrote
as I see you try to check if the key already exists in dictionary but it seems there is a mistake over there, you should check the value instead the index if it exists in dictionary and also you must check before putting into the dictionary and if it exits you can update current value by adding ; end the current value
for i in range(0,len(new_list),2):
if not new_list[i] in new_edict.keys():
new_edict[new_list[i]] = new_list[i+1]
else:
Update it here… like
new_list[new_list[i]] = new_list[new_list[i]] +";"+ new_list[i+1]

Problem skipping line whilst iterating using previous line and current line comparison

I have a list of sorted data arranged so that each item in the list is a csv line to be written to file.
The final step of the script checks the contents of each field and if all but the last field match then it will copy the current line's last field onto the previous line's last field.
I would like to as I've found and processed one of these matches skip the current line where the field was copied from thus only leaving one of the lines.
Here's an example set of data
field1,field2,field3,field4,something
field1,field2,field3,field4,else
Desired output
field1,field2,field3,field4,something else
This is what I have so far
output_csv = ['field1,field2,field3,field4,something',
'field1,field2,field3,field4,else']
# run through the output
# open and create a csv file to save output
with open('output_table.csv', 'w') as f:
previous_line = None
part_duplicate_line = None
part_duplicate_flag = False
for line in output_csv:
part_duplicate_flag = False
if previous_line is not None:
previous = previous_line.split(',')
current = line.split(',')
if (previous[0] == current[0]
and previous[1] == current[1]
and previous[2] == current[2]
and previous[3] == current[3]):
print(previous[0], current[0])
previous[4] = previous[4].replace('\n', '') + ' ' + current[4]
part_duplicate_line = ','.join(previous)
part_duplicate_flag = True
f.write(part_duplicate_line)
if part_duplicate_flag is False:
f.write(previous_line)
previous_line = line
ATM script adds the line but doesn't skip the next line, I've tried various renditions of continue statements after part_duplicate_line is written to file but to no avail.
Looks like you want one entry for each combination of the first 4 fields
You can use a dict to aggregate data -
#First we extract the key and values
output_csv_keys = list(map(lambda x: ','.join(x.split(',')[:-1]), output_csv))
output_csv_values = list(map(lambda x: x.split(',')[-1], output_csv))
#Then we construct a dictionary with these keys and combine the values into a list
from collections import defaultdict
output_csv_dict = defaultdict(list)
for key, value in zip(output_csv_keys, output_csv_values):
output_csv_dict[key].append(value)
#Then we extract the key/value combinations from this dictionary into a list
for_printing = [','.join([k, ' '.join(v)]) for k, v in output_csv_dict.items()]
print(for_printing)
#Output is ['field1,field2,field3,field4,something else']
#Each entry of this list can be output to the csv file
I propose to encapsulate what you want to do in a function where the important part obeys this logic:
either join the new info to the old record
or output the old record and forget it
of course at the end of the loop we have in any case a dangling old record to output
def join(inp_fname, out_fname):
'''Input file contains sorted records, when two (or more) records differ
only in the last field, we join the last fields with a space
and output only once, otherwise output the record as-is.'''
######################### Prepare for action ##########################
from csv import reader, writer
with open(inp_fname) as finp, open(out_fname, 'w') as fout:
r, w = reader(finp), writer(fout)
######################### Important Part starts here ##############
old = next(r)
for new in r:
if old[:-1] == new[:-1]:
old[-1] += ' '+new[-1]
else:
w.writerow(old)
old = new
w.writerow(old)
To check what I've proposed you can use these two snippets (note that these records are shorter than yours, but it's an example and it doesn't matter because we use only -1 to index our records).
The 1st one has a "regular" last record
open('a0.csv', 'w').write('1,1,2\n1,1,3\n1,2,0\n1,3,1\n1,3,2\n3,3,0\n')
join('a0.csv', 'a1.csv')
while the 2nd has a last record that must be joined to the previous one.
open('b0.csv', 'w').write('1,1,2\n1,1,3\n1,2,0\n1,3,1\n1,3,2\n')
join('b0.csv', 'b1.csv')
If you run the snippets, as I have done before posting, in the environment where you have defined join you should get what you want.

while iterating if statement wont evaluate

this little snippet of code is my attempt to pull multiple unique values out of rows in a CSV. the CSV looks something like this in the header:
descr1, fee part1, fee part2, descr2, fee part1, fee part2,
with the descr columns having many unique names in a single column. I want to take these unique fee names and make a new header out of them. to do this I decided to start by getting all the different descr columns names, so that when I start pulling data from the actual rows I can check to see if that row has a fee amount or one of the fee names I need. There are probably a lot of things wrong with this code, but I am a beginner. I really just want to know why my first if statement is never triggered when the l in fin does equal a comma, I know it must at some point as it writes a comma to my row string. thanks!
row = ''
header = ''
columnames = ''
cc = ''
#fout = open(","w")
fin = open ("raw data.csv","rb")
for l in fin:
if ',' == l:
if 'start of cust data' not in row:
if 'descr' in row:
columnames = columnames + ' ' + row
row = ''
else:
pass
else:
pass
else:
row = row+l
print(columnames)
print(columnames)
When you iterate over a file, you get lines, not characters -- and they have the newline character, \n, at the end. Your if ',' == l: statement will never succeed because even if you had a line with only a single comma in it, the value of l would be ",\n".
I suggest using the csv module: you'll get much better results than trying to do this by hand like you're doing.

How to Store Rows from CSV File into Python and Print Data with HTML

Basically my problem is this: I have a CSV excel file with info on Southpark characters and I and I have an HTML template and what I have to do is take the data by rows (stored in lists) for each character and using the HTML template given implement that data to create 5 seperate HTML pages with the characters last names.
Here is an image of the CSV file: i.imgur.com/rcIPW.png
This is what I have so far:
askfile = raw_input("What is the filename?")
southpark = []
filename = open(askfile, 'rU')
for row in filename:
print row[0:105]
filename.close()
The above prints out all the info on the IDLE shell in five rows but I have to find a way to separate each row AND column and store it into a list (which I don't know how to do). It's pretty rudimentary code I know I'm trying to figure out a way to store the rows and columns first, then I will have to use a function (def) to first assign the data to the HTML template and then create an HTML file from that data/template..and I'm so far a noob I tried searching through the net but I just don't understand the stuff.
I am not allowed to use any downloadable modules but I can use things built in Python like import csv or whatnot, but really its supposed to be written with a couple functions, list, strings, and loops..
Once I figure out how to separate the rows and columns and store them then I can work on implementing into HTML template and creating the file.
I'm not trying to have my HW done for me it's just that I pretty much suck at programming so any help is appreciated!
BTW I am using Python 2.7.2 and if you want to DL the CSV file click here.
UPDATE:
Okay, thanks a lot! That helped me understand what each row was printing and what info is being read by the program. Now since I have to use functions in this program somehow this is what I was thinking.
Each row (0-6) prints out separate values, but just the print row function prints out one character and all his corresponding values which is what I need. What I want is to print out data like "print row" would but I have to store each of those 5 characters in a separate list.
Basically "print row" prints out all 5 characters with each of their corresponding attributes, how can I split each of them into 5 variables and store them as a list?
When I do print row[0] it only prints out the names, or print row1 only prints the DOB. I was thinking of creating a def function that takes only print "row" and splits into 5 variables in a loop and then another def function takes those variables/lists of data and combines them with the HTML template, and at the end I have to figure out how to create HTML files in Python..
Sorry if I sound confusing just trying to make sense of it all. This is my code right now it gives an error that there are too many values to unpack but I am just trying to fiddle around and try different things and see if they work. Based on what I wanted to do above I will probably have to delete all most of this code and find a way to rewrite it with list type functions like .append or .strip, etc which I am not very familiar with..
import csv
original = file('southpark.csv', 'rU')
reader = csv.reader(original)
# List of Data
name, dob, descript, phrase, personality, character, apparel = []
count = 0
def southparkinfo():
for row in reader:
count += 1
if count == 0:
row[0] = name
print row[0] # Name (ex. Stan Marsh)
print "----------------"
elif count == 1:
row[1] = dob
print row[1] # DOB
print "----------------"
elif count == 2:
row[2] = descript
print row[2] # Descriptive saying (ex. Respect My Authoritah!)
print "----------------"
elif count == 3:
row[3] = phrase
print row[3] # Catch Phrase (ex. Mooom!)
print "----------------"
elif count == 4:
row[4] = personality
print row[4] # Personality (ex. Jewish)
print "----------------"
elif count == 5:
row[5] = character
print row[5] # Characteristic (ex. Politically incorrect)
print "----------------"
elif count == 6:
row[6] = apparel
print row[6] # Apparel (ex. red gloves)
return
reader.close()
First and foremost, have a look at the CSV docs.
Once you understand the basics take a look at this code. This should get you started on the right path:
import csv
original = file('southpark.csv', 'rU')
reader = csv.reader(original)
for row in reader:
#will print each row by itself (all columns from names up to what they wear)
print row
print "-----------------"
#will print first column (character names only)
print row[0]
You want to import csv module so you can work with the CSV filetype. Open the file in universal newline mode and read it with csv.reader. Then you can use a for loop to begin iterating through the rows depending on what you want. The first print row will print a single line of all a single character's data (ie: everything from their name up to their clothing type) like so:
['Stan Marsh', 'DOB: October 19th', 'Dude!', 'Aww #$%^!', 'Star Quarterback', 'Wendy', 'red gloves']
-----------------
['Kyle Broflovski', 'DOB: May 26th', 'Kick the baby!', 'You ***!', 'Jewish', 'Canadian', 'Ushanka']
-----------------
['Eric Theodore Cartman', 'DOB: July 1', 'Respect My Authroitah!', 'Mooom!', 'Big-boned', 'Political
ly incorrect', 'Knit-cap!']
-----------------
['Kenny McCormick', 'DOB: March 22', 'DOD: Every other week', 'Mmff Mmff', 'MMMFFF!!!', 'Mysterion!'
, 'Orange Parka']
-----------------
['Leopold Butters Stotch', 'DOB:Younger than the others!', 'The 4th friend', 'Professor chaos', 'stu
tter', 'innocent', 'nerdy']
-----------------
Finally, the second statement print row[0] will provide you with the character names only. You can change the number and you'll be able to grab the other data as necessary. Remember, in a CSV file everything starts at 0, so in your case you can only go up to 6 because A=0, B=1, C=2, etc... To see these outputs more clearly, it's probably best if you comment out one of the print statements so you get a clearer picture of what you are grabbing.
-----------------
Stan Marsh
-----------------
Kyle Broflovski
-----------------
Eric Theodore Cartman
-----------------
Kenny McCormick
-----------------
Leopold Butters Stotch
Note I threw in that print "-----------------" so you would be able to see the different outputs.
Hope this helps you get you off to a start.
Edit To answer your second question: The easiest way (although probably not the best way) to grab all of a single character's info would be to do something like this:
import csv
original = file('southpark.csv', 'rU')
reader = csv.reader(original)
stan = reader.next()
kyle = reader.next()
eric = reader.next()
kenny = reader.next()
butters = reader.next()
print eric
which outputs:
['Eric Theodore Cartman', 'DOB: July 1', 'Respect My Authroitah!', 'Mooom!', 'Big-boned', 'Politically incorrect', 'Knit-cap!']
Take note that if your CSV is modified such that the order of the characters are moved (ex: butters is moved to top) you will output the info of another character.

How do I alphabetize a file in Python?

I am trying to get a list of presidents alphabetized by last name, even though the file that it is being drawn is currently listed first name, last name, date in office, and date out of office.
Here is what I have, any help on what I need to do with this. I have searched around for some answers, and most of them are beyond my level of understanding. I feel like I am missing something small. I tried to break them all out into a list, and then sort them, but I could not get it to work, so this is where I started from.
INPUT_FILE = 'presidents.txt'
OUTPUT_FILE = 'president_NEW.txt'
OUTPUT_FILE2 = 'president_NEW2.txt'
def main():
infile = open(INPUT_FILE)
outfile = open(OUTPUT_FILE, 'w')
outfile2 = open(OUTPUT_FILE2,'w')
stuff = infile.readline()
while stuff:
stuff = stuff.rstrip()
data = stuff.split('\t')
president_First = data[1]
president_Last = data[0]
start_date = data[2]
end_date = data[3]
sentence = '%s %s was president from %s to %s' % \
(president_First,president_Last,start_date,end_date)
sentence2 = '%s %s was president from %s to %s' % \
(president_Last,president_First,start_date, end_date)
outfile2.write(sentence2+ '\n')
outfile.write(sentence + '\n')
stuff = infile.readline()
infile.close()
outfile.close()
main()
What you should do is put the presidents in a list, sort that list, and then print out the resulting list.
Before your for loop add:
presidents = []
Have this code inside the for loop after you pull out the names/dates
president = (last_name, first_name, start_date, end_date)
presidents.append(president)
After the for loop
presidents.sort() # because we put last_name first above
# it will sort by last_name
Then print it out:
for president in presidents
last_name, first_name, start_date, end_date = president
string1 = "..."
It sounds like you tried to break them out into a list. If you had trouble with that, show us the code that resulting from that attempt. It was right way to approach the problem.
Other comments:
Just a couple of points where you code could be simpler. Feel free to ignore or use this as you want:
president_First=data[1]
president_Last= data[0]
start_date=data[2]
end_date=data[3]
can be written as:
president_Last, president_First, start_date, end_date = data
stuff=infile.readline()
And
while stuff:
stuff=stuff.rstrip()
data=stuff.split('\t')
...
stuff = infile.readline()
can be written as:
for stuff in infile:
...
#!/usr/bin/env python
# this sounds like a homework problem, but ...
from __future__ import with_statement # not necessary on newer versions
def main():
# input
with open('presidents.txt', 'r') as fi:
# read and parse
presidents = [[x.strip() for x in line.split(',')] for line in fi]
# sort
presidents = sorted(presidents, cmp=lambda x, y: cmp(x[1], y[1]))
# output
with open('presidents_out.txt', 'w') as fo:
for pres in presidents:
print >> fo, "president %s %s was president %s %s" % tuple(pres)
if __name__ == '__main__':
main()
I tried to break them all out into a list, and then sort them
What do you mean by "them"?
Breaking up the line into a list of items is a good start: that means you treat the data as a set of values (one of which is the last name) rather than just a string. However, just sorting that list is no use; Python will take the 4 strings from the line (the first name, last name etc.) and put them in order.
What you want to do is have a list of those lists, and sort it by last name.
Python's lists provide a sort method that sorts them. When you apply it to the list of president-info-lists, it will sort those. But the default sorting for lists will compare them item-wise (first item first, then second item if the first items were equal, etc.). You want to compare by last name, which is the second element in your sublists. (That is, element 1; remember, we start counting list elements from 0.)
Fortunately, it is easy to give Python more specific instructions for sorting. We can pass the sort function a key argument, which is a function that "translates" the items into the value we want to sort them by. Yes, in Python everything is an object - including functions - so there is no problem passing a function as a parameter. So, we want to sort "by last name", so we would pass a function that accepts a president-info-list and returns the last name (i.e., element [1]).
Fortunately, this is Python, and "batteries are included"; we don't even have to write that function ourself. We are given a magical tool that creates functions that return the nth element of a sequence (which is what we want here). It's called itemgetter (because it makes a function that gets the nth item of a sequence - "item" is more usual Python terminology; "element" is a more general CS term), and it lives in the operator module.
By the way, there are also much neater ways to handle the file opening/closing, and we don't need to write an explicit loop to handle reading the file - we can iterate directly over the file (for line in file: gives us the lines of the file in turn, one each time through the loop), and that means we can just use a list comprehension (look them up).
import operator
def main():
# We'll set up 'infile' to refer to the opened input file, making sure it is automatically
# closed once we're done with it. We do that with a 'with' block; we're "done with the file"
# at the end of the block.
with open(INPUT_FILE) as infile:
# We want the splitted, rstripped line for each line in the infile, which is spelled:
data = [line.rstrip().split('\t') for line in infile]
# Now we re-arrange that data. We want to sort the data, using an item-getter for
# item 1 (the last name) as the sort-key. That is spelled:
data.sort(key=operator.itemgetter(1))
with open(OUTPUT_FILE) as outfile:
# Let's say we want to write the formatted string for each line in the data.
# Now we're taking action instead of calculating a result, so we don't want
# a list comprehension any more - so we iterate over the items of the sorted data:
for item in data:
# The item already contains all the values we want to interpolate into the string,
# in the right order; so we can pass it directly as our set of values to interpolate:
outfile.write('%s %s was president from %s to %s' % item)
I did get this working with Karls help above, although I did have to edit the code to get it to work for me, due to some errors I was getting. I eliminated those and ended up with this.
import operator
INPUT_FILE = 'presidents.txt'
OUTPUT_FILE2= 'president_NEW2.txt'
def main():
with open(INPUT_FILE) as infile:
data = [line.rstrip().split('\t') for line in infile]
data.sort(key=operator.itemgetter(0))
outfile=open(OUTPUT_FILE2,'w')
for item in data:
last=item[0]
first=item[1]
start=item[2]
end=item[3]
outfile.write('%s %s was president from %s to %s\n' % (last,first,start,end))
main()

Categories

Resources