From text file to dictionary - python

I'm a txt file and taking the strings and making the first my key for my dictionary I'm creating and the rest will be my values as a tuple. There is header before hand and I've already made my code "ignore" it at the start.
Example of txt values:
"Ronald Reagan","1981","8","69","California","Republican"
"George Bush","1989","4","64","Texas","Republican"
"Bill Clinton","1993","8","46","Arkansas","Democrat"
I want to create dictionary that gives the following output:
{"Ronald Reagan": (1981,8,69,"California", "Republican") etc.}
This is what I currenltly have as my code :
def read_file(filename):
d={}
f= open(filename,"r")
first_line = f.readline()
for line in f:
#line=line.strip('"')
#line=line.rstrip()
data=line.split('"')
data=line.replace('"', "")
print(data)
key_data=data[0]
values_data= data[1:]
valuesindata=tuple(values_data)
d[key_data]=valuesindata
print(d)
read_file(filename)
The first print statement (I put it there just to see what the output at that point was and it gave me the following :
Ronald Reagan,1981,8,69,California,Republican
George Bush,1989,4,64,Texas,Republican
etc. By the time it gets to the second print statement it does the following:
{'R': ('o', 'n', 'a', 'l', 'd', ' ', 'R', 'e', 'a', 'g', 'a', 'n', ',', '1', '9', '8', '1', ',', '8', ',', '6', '9', ',', 'C', 'a', 'l', 'i', 'f', 'o', 'r', 'n', 'i', 'a', ',', 'R', 'e', 'p', 'u', 'b', 'l', 'i', 'c', 'a', 'n', '\n'), 'G': ('e', 'o', 'r', 'g', 'e', ' ', 'B', 'u', 's', 'h', ',', '1', '9', '8', '9', ',', '4', ',', '6', '4', ',', 'T', 'e', 'x', 'a', 's', ',', 'R', 'e', 'p', 'u', 'b', 'l', 'i', 'c', 'a', 'n', '\n')}
Also, I'm splitting it at the quotes because some of my strings contain a comma as part of the name, example : "Carl, Jr."
I'm not wanting to import the csv module, so is there a way to do that?

You can use the csv module like alecxe suggested or you can do it "manually" like so:
csv_dict = {}
with open(csv_file, 'r') as f:
for line in f:
line = line.strip().replace('"', '').split(',')
csv_dict[line[0]] = tuple(int(x) if x.isdigit() else str(x) for x in line[1:])
This will remove the double quotes, cast numerical values to int and create a dictionary of tuples.

The major problem in your code leading into this weird result is that data variable is a string, data[0] would give you the first character, data[1:] the rest - you needed to call split(",") to first split the string into the list.
I have a limitation to not import any modules.
The idea is to use split(",") to split each line into individual items and strip() to remove the quotes around the item values:
d = {}
with open(filename) as f:
for line in f:
items = [item.strip('"').strip() for item in line.split(",")]
d[items[0]] = items[1:]
print(d)
Prints:
{'Bill Clinton': ['1993', '8', '46', 'Arkansas', 'Democrat'],
'George Bush': ['1989', '4', '64', 'Texas', 'Republican'],
'Ronald Reagan': ['1981', '8', '69', 'California', 'Republican']}
FYI, using csv module from the standard library would make things much easier:
import csv
from pprint import pprint
d = {}
with open(filename) as f:
reader = csv.reader(f)
for row in reader:
d[row[0]] = row[1:]
pprint(d)
You can also use a dictionary comprehension:
d = {row[0]: row[1:] for row in reader}

Related

How can I extract words from a file from a string

So I have tried to make it so that I can extract words in a file with every English word from random letters a generator gives me. Then I would like to add the found words to a list. But I am having a bit of a problem acquiring this result. Could you help me please?
This is what I have tried:
import string
import random
def gen():
b = []
for i in range(100):
a = random.choice(string.ascii_lowercase)
b.append(a)
with open('allEnglishWords.txt') as f:
words = f.read().splitlines()
joined = ''.join([str(elem) for elem in b])
if joined in words:
print(joined)
f.close()
print(joined)
gen()
if you are wondering where I got the txt file it is located here http://www.gwicks.net/dictionaries.htm. I downloaded the one labeled ENGLISH - 84,000 words the text file
import string
import random
b = []
for i in range(100):
a = random.choice(string.ascii_lowercase)
b.append(a)
b = ''.join(b)
with open('engmix.txt', 'r') as f:
words = [x.replace('\n', '') for x in f.readlines()]
output=[]
for word in words:
if word in b:
output.append(word)
print(output)
Output:
['a', 'ad', 'am', 'an', 'ape', 'au', 'b', 'bi', 'bim', 'c', 'cb', 'd', 'e',
'ed', 'em', 'eo', 'f', 'fa', 'fy', 'g', 'gam', 'gem', 'go', 'gov', 'h',
'i', 'j', 'k', 'kg', 'ko', 'l', 'le', 'lei', 'm', 'mg', 'ml', 'mr', 'n',
'no', 'o', 'om', 'os', 'p', 'pe', 'pea', 'pew', 'q', 'ql', 'r', 's', 'si',
't', 'ta', 'tap', 'tape', 'te', 'u', 'uht', 'uk', 'v', 'w', 'wan', 'x', 'y',
'yo', 'yom', 'z', 'zed']
Focusing on acquiring this result, assume your words are seperated by a single space:
with open("allEnglishWords.txt") as f:
for line in f:
for word in line.split(" "):
print(word)
Also, you don't need f.close() inside a with block.

Python - Read two letters in table from string

I'm doing a ADFGVX cipher encoder and decoder and I have a polybius square where I need to read two letters.
reversesquare = {'AA': '1', 'AD': '2', 'AF': '3', 'AG': '4', 'AV': '5', 'AX': '6',
'DA': '7', 'DD': '8', 'DF': '9', 'DG': '0', 'DV': 'Q', 'DX': 'W',
'FA': 'E', 'FD': 'R', 'FF': 'T', 'FG': 'Y', 'FV': 'U', 'FX': 'I',
'GA': 'O', 'GD': 'P', 'GF': 'A', 'GG': 'S', 'GV': 'D', 'GX': 'F',
'VA': 'G', 'VD': 'H', 'VF': 'J', 'VG': 'K', 'VV': 'L', 'VX': 'Z',
'XA': 'X', 'XD': 'C', 'XF': 'V', 'XG': 'B', 'XV': 'N', 'XX': 'M'}
def Decrypt_Final(sortedcipher):
mensagemcifrada = ""
for letter in sortedcipher: # Esta a ler letra a letra quando devia ser 2 de uma vez
if letter in reversesquare:
mensagemcifrada += (reversesquare[letter])
return mensagemcifrada
I have this function which is gonna take a string (sortedcipher) like: GXFXVVFXGDFA
I want my program to read two letters each time its loops over it like, "GX" "FX" .... and find in my reversesquare when it matches.
A "one-liner", though less readable, would be following.
def Decrypt_Final(sortedcipher):
return "".join([
reversesquare[sortedcipher[i:i+2]]
for i in range(0, len(sortedcipher), 2)
]
)
You could zip the string with an offset like this:
for a, b in zip(sortedcipher, sortedcipher[1:]):
pair = a + b
# pair now contains "GF", "FX" and so on..
Another option is to use list-comprehensoion:
def Decrypt_Final(sortedcipher):
# turns the cipher into a list of strings where each element
# contains two characters from the string
letters = (sortedcipher[i:i+2] for i in range(0, len(sortedcipher), 2))
return [reversesquare[letter] for letter in letters]

How can I filter out a List from a .txt file using list comprehension?

I am taking a Python class and I can't figure out a take home quiz. I am using IDLE to write the code.
We have to load a file called names.txt into a list. The file contains the following:
Joe Smith
000000
Jeff Mitchell
xxxxxxx
Benjamin Grant
12346
I need to filter out lines that contain the "xxxxxxx" or numbers. I am attempting to use list comprehension with the following code:
> names = open(r'C:\Users\abcdse\Documents\Python\names1.txt','r')
> names_contents = names.read()
> filtered_names = [n for n in names_contents if n !='xxxxxxx']
> names.close()
> print(filtered_names)
However, when I print the filtered_names output, names are not being filtered and rather than appearing in a dropdown format, they appear like this:
['J', 'o', 'e', ' ', 'S', 'm', 'i', 't', 'h', '\n', '0', '0', '0', '0', '0', '0', '\n', 'J', 'e', 'f', 'f', ' ', 'M', 'i', 't', 'c', 'h', 'e', 'l', 'l', '\n', 'x', 'x', 'x', 'x', 'x', 'x', 'x', '\n', 'B', 'e', 'n', 'j', 'a', 'm', 'i', 'n', ' ', 'G', 'r', 'a', 'n', 't', '\n', '1', '2', '3', '4', '6', '\n']
What am I doing wrong here? Is it possible to filter out both, the "xxxxxxx" and numbers?
Thank you for your support as I get started with code.
You were almost there
names = open(r'C:\Users\abcdsed\Documents\Python\names1.txt','r')
name_contents = names.readlines() # list of lines
filtered_names = [n for n in name_contents if (not n.isnumeric() or n != 'xxxxxxx']
Might want to look things up using your favorite search engine before posting here though. This is a very trivial question.
You can use readlines to read the data and list comprehension to filter out xxx
ss = '''
Joe Smith
000000
Jeff Mitchell
xxxxxxx
Benjamin Grant
12346
'''.strip()
with open('names.txt','w') as f: f.write(ss) # write data file
###############################
with open('names.txt') as f:
lns = f.readlines()
xx = [ln.strip() for ln in lns if ln.strip() != 'xxxxxxx']
print('\n'.join(xx))
Output
Joe Smith
000000
Jeff Mitchell
Benjamin Grant
12346
names_contents is a string, so you are comparing a string against char in this line of code n !='xxxxxxx'. So first you have to split the string into list of strings representing each line. Try this
lines = names_contents.split("\n")
filtered_names = [n for n in lines if n !='xxxxxxx']
The values you want to remove
filter_vals = 'xxxxxxx\n'
Read the file
with open('64797525.txt') as f:
out = [i.strip() for i in f.readlines() if i not in filter_vals] # remove what's in the list
print(out)
['Joe Smith', '000000', 'Jeff Mitchell', 'Benjamin Grant', '12346']

Morse code decoder python function [duplicate]

This question already has answers here:
'Return' keyword returns only one element from a loop?
(3 answers)
Closed 6 years ago.
I'm trying to create a function that takes mores code as an input in string format and returns the message decoded also as a string.
I've identified that i need to split the string where there is a space to ascertain each individual character in morse. and a loop to return a value if matched in dictionary key. I'm a beginner and going really wrong somewhere. Thanks in advance.
code_dict = {'.-...': '&', '--..--': ',', '....-': '4', '.....': '5',
'...---...': 'SOS', '-...': 'B', '-..-': 'X', '.-.': 'R',
'.--': 'W', '..---': '2', '.-': 'A', '..': 'I', '..-.': 'F',
'.': 'E', '.-..': 'L', '...': 'S', '..-': 'U', '..--..': '?',
'.----': '1', '-.-': 'K', '-..': 'D', '-....': '6', '-...-': '=',
'---': 'O', '.--.': 'P', '.-.-.-': '.', '--': 'M', '-.': 'N',
'....': 'H', '.----.': "'", '...-': 'V', '--...': '7', '-.-.-.': ';',
'-....-': '-', '..--.-': '_', '-.--.-': ')', '-.-.--': '!', '--.': 'G',
'--.-': 'Q', '--..': 'Z', '-..-.': '/', '.-.-.': '+', '-.-.': 'C', '---...': ':',
'-.--': 'Y', '-': 'T', '.--.-.': '#', '...-..-': '$', '.---': 'J', '-----': '0',
'----.': '9', '.-..-.': '"', '-.--.': '(', '---..': '8', '...--': '3'
}
def decodeMorse(morseCode):
for item in morseCode.split(' '):
return code_dict.get(item)
my problem is it only decodes the first character of the string entered in morse
return something instantly ends the function. You stop stop processing input after first character.
In other languages, you could instead create list (array) with results, and return that:
def decodeMorse(morseCode):
results = []
for item in morseCode.split(' '):
results.append(code_dict.get(item))
return results
Or, as #Bakuriu suggested:
def decodeMorse(morseCode):
for item in morseCode.split(' '):
return [code_dict.get(item) for item in morseCode.split(' ')]
There is however simple flaw with this approach -- it decodes whole string at once, even if you only need first few characters.
We can do better in Python.
Use yield instead of return:
def decodeMorse(morseCode):
for item in morseCode.split(' '):
yield code_dict.get(item)
Now, the function instead of returning whole list at once, returns generator which yields one character at once. If you don't need whole translation, it's likely to be faster. It'll also use less memory (you don't need to construct and keep in memory list of all the characters).
You can convert the generator into list (list(decodeMorse('... --- ...'))) or into string (''.join(decodeMorse('... --- ...'))) if you need to. You can also just iterate over it like over a sequence:
>>> decoded = decodeMorse('... --- ...')
>>> for char in decoded:
... print(char)
...
S
O
S
>>>
...except, you can only do it once:
>>> for char in decoded:
... print(char)
...
>>>
...because generators are disposable.
If you need to iterate over it another time, store it in list, or create another generator by calling decodeMorse again.

Python MemoryError

I have a small script, generating a wordlist from given chars in python. But always gets a MemoryError after execution. Why is it stored in the ram? is there better way of code not using ram but giving a working output?
from itertools import product
chars = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'j', 'k',
'm', 'n', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G',
'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S',
'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '1', '2', '3',
'4', '5', '6', '7', '8', '9']
length = 8
result = ["".join(item) for item in product(*[chars]*length)]
for item in result:
print(item)
By putting square brackets around your generator, you tell Python to turn it into an actual list, in-memory. You don't really need all of the elements at once, do you?
Instead, turn your square brackets into parentheses and Python will keep it a generator, which will yield items only when requested:
>>> ("".join(item) for item in product(*[chars]*length))
<generator object <genexpr> at 0x2d9cb40>
>>> ["".join(item) for item in product(*[chars]*length)]
[1] 3245 killed ipython2
Take a look at the string module. It has a bunch of helpful constants:
import string
from itertools import product
chars = string.letters + string.digits
length = 8
result = (''.join(item) for item in product(*[chars], repeat=length))
for item in result:
print(item)

Categories

Resources