Removing unecessary double quotes from keys in a dictionary - Python - python

The goal of my program is to create a dictionary of items (key) and their count (values). The keys are extracted from a text file, in which they're organized as lists.
Example: ['synonymous_variant'] ['splice_region_variant&synonymous_variant' ['synonymous_variant'] (each lists are on a new line, without any separators)
Code:
from collections import Counter
file = open('/home/becquart/Stagiaire_refinement_construct_peptides/Travail5/RE__[Allogenomics]_travail_Vcf/results.txt', 'r').read()
for char in '""-.,\n[]':
file = file.replace(char,' ')
for i in char:
file = file.replace('""', ' ')
file = file.lower()
word_list = file.split()
d = dict(Counter(word_list).most_common())
print d
The output is something like: {"'coding_sequence_variant&3_prime_utr_variant'": 6, "'inframe_insertion&nmd_transcript_variant'": 17 etc.
I would like to remove " from the keys, but I am having a hard time figuring it out as I'm very new in programming...I would be extremely happy if I could get this solved.
Thank you in advance!
Edit:
Input file here: https://ufile.io/v1tm0

Related

Convert a text file into a dictionary

I have a text file in this format:
key:object,
key2:object2,
key3:object3
How can I convert this into a dictionary in Python for the following process?
Open it
Check if string s = any key in the dictionary
If it is, then string s = the object linked to the aforementioned key.
If not, nothing happens
File closes.
I've tried the following code for dividing them with commas, but the output was incorrect. It made the combination of key and object in the text file into a single key and single object, effectively duplicating it:
Code:
file = open("foo.txt","r")
dict = {}
for line in file:
x = line.split(",")
a = x[0]
b = x[0]
dict[a] = b
Incorrect output:
key:object, key:object
key2:object2, key2:object2
key3:object3, key3:object3
Thank you
m={}
for line in file:
x = line.replace(",","") # remove comma if present
y=x.split(':') #split key and value
m[y[0]] = y[1]
# -*- coding:utf-8 -*-
key_dict={"key":'',"key5":'',"key10":''}
File=open('/home/wangxinshuo/KeyAndObject','r')
List=File.readlines()
File.close()
key=[]
for i in range(0,len(List)):
for j in range(0,len(List[i])):
if(List[i][j]==':'):
if(List[i][0:j] in key_dict):
for final_num,final_result in enumerate(List[i][j:].split(',')):
if(final_result!='\n'):
key_dict["%s"%List[i][0:j]]=final_result
print(key_dict)
I am using your file in "/home/wangxinshuo/KeyAndObject"
You can convert the content of your file to a dictionary with some oneliner similar to the below one:
result = {k:v for k,v in [line.strip().replace(",","").split(":") for line in f if line.strip()]}
In case you want the dictionary values to be stripped, just add v.strip()

how to join two text files in python

i have two text files like
['i', 'hate', 'sausages', 'noa', 'hate', 'i']
and then a numbers file
1, 2, 3, 3, 2, 1
now i need to try and join the two files to each other
this is what i have so far
positionlist=open('positions.txt', 'r')#opening the code files with the postions
wordlist=open('Words.txt', 'r')#opening the word files
positions = positionslist.read() #the code is reading the file to see what it is in the txt and is saving them as a vairable
words = wordslist.read() #the code is reading the file to see what it is in the txt and is saving them as a vairable
print(positions) #prints the positions
print(words) #prints the words
positionfinal = list(positions) # makes the postions into a list
#this is where i need to tey and connect the two codes together
remover = words.replace("[","") #replacing the brackes withnothing so that it is red like a string
remover = remover.replace("]","")#replacing the brackes withnothing so that it is red like a string
The format of the input files looks like it's Python literals (list of strings, tuple of integers), so you can use ast.literal_eval to parse it. And then you can use the built-in zip function to loop through them together and print them out. Something like this:
import ast
with open('words.txt') as f:
words = ast.literal_eval(f.read()) # read as list of strings
with open('positions.txt') as f:
positions = ast.literal_eval(f.read()) # read as tuple of ints
for position, word in zip(positions, words):
print(position, word)
Here's the output:
1 i
2 hate
3 sausages
3 noa
2 hate
1 i

Looping through a dictionary in python

I am creating a main function which loops through dictionary that has one key for all the values associated with it. I am having trouble because I can not get the dictionary to be all lowercase. I have tried using .lower but to no avail. Also, the program should look at the words of the sentence, determine whether it has seen more of those words in sentences that the user has previously called "happy", "sad", or "neutral", (based on the three dictionaries) and make a guess as to which label to apply to the sentence.
an example output would be like
Sentence: i started screaming incoherently about 15 mins ago, this is B's attempt to calm me down.
0 appear in happy
0 appear in neutral
0 appear in sad
I think this is sad.
You think this is: sad
Okay! Updating.
CODE:
import csv
def read_csv(filename, col_list):
"""This function expects the name of a CSV file and a list of strings
representing a subset of the headers of the columns in the file, and
returns a dictionary of the data in those columns, as described below."""
with open(filename, 'r') as f:
# Better covert reader to a list (items represent every row)
reader = list(csv.DictReader(f))
dict1 = {}
for col in col_list:
dict1[col] = []
# Going in every row of the file
for row in reader:
# Append to the list the row item of this key
dict1[col].append(row[col])
return dict1
def main():
dictx = read_csv('words.csv', ['happy'])
dicty = read_csv('words.csv', ['sad'])
dictz = read_csv('words.csv', ['neutral'])
dictxcounter = 0
dictycounter = 0
dictzcounter = 0
a=str(raw_input("Sentence: ")).split(' ')
for word in a :
for keys in dictx['happy']:
if word == keys:
dictxcounter = dictxcounter + 1
for values in dicty['sad']:
if word == values:
dictycounter = dictycounter + 1
for words in dictz['neutral']:
if word == words:
dictzcounter = dictzcounter + 1
print dictxcounter
print dictycounter
print dictzcounter
Remove this line from your code:
dict1 = dict((k, v.lower()) for k,v in col_list)
It overwrites the dictionary that you built in the loop.

Linear search to find spelling errors in Python

I'm working on learning Python with Program Arcade Games and I've gotten stuck on one of the labs.
I'm supposed to compare each word of a text file (http://programarcadegames.com/python_examples/en/AliceInWonderLand200.txt) to find if it is not in the dictionary file (http://programarcadegames.com/python_examples/en/dictionary.txt) and then print it out if it is not. I am supposed to use a linear search for this.
The problem is even words I know are not in the dictionary file aren't being printed out. Any help would be appreciated.
My code is as follows:
# Imports regular expressions
import re
# This function takes a line of text and returns
# a list of words in the line
def split_line(line):
split = re.findall('[A-Za-z]+(?:\'\"[A-Za-z]+)?', line)
return split
# Opens the dictionary text file and adds each line to an array, then closes the file
dictionary = open("dictionary.txt")
dict_array = []
for item in dictionary:
dict_array.append(split_line(item))
print(dict_array)
dictionary.close()
print("---Linear Search---")
# Opens the text for the first chapter of Alice in Wonderland
chapter_1 = open("AliceInWonderland200.txt")
# Breaks down the text by line
for each_line in chapter_1:
# Breaks down each line to a single word
words = split_line(each_line)
# Checks each word against the dictionary array
for each_word in words:
i = 0
# Continues as long as there are more words in the dictionary and no match
while i < len(dict_array) and each_word.upper() != dict_array[i]:
i += 1
# if no match was found print the word being checked
if not i <= len(dict_array):
print(each_word)
# Closes the first chapter file
chapter_1.close()
Linear search to find spelling errors in Python
Something like this should do (pseudo code)
sampleDict = {}
For each word in AliceInWonderLand200.txt:
sampleDict[word] = True
actualWords = {}
For each word in dictionary.txt:
actualWords[word] = True
For each word in sampleDict:
if not (word in actualDict):
# Oh no! word isn't in the dictionary
A set may be more appropriate than a dict, since the value of the dictionary in the sample isn't important. This should get you going, though

Google search from python app

I'm trying to take an input file read each line and search google with that line and print the search results from the query. I get the first search result which is from wikipedia which is great but then I get the error: File "test.py", line 24, in
dictionary[str(lineToRead)].append(str(i))
KeyError: 'mouse'
input file pets.txt looks like this:
cat
dog
bird
mouse
inputFile = open("pets.txt", 'r') # Makes File object
outputFile = open("results.csv", "w")
dictionary = {} # Our "hash table"
compare = "https://en.wikipedia.org/wiki/" # urls will compare against this string
for line in inputFile.read().splitlines():
# ---- testing ---
print line
lineToRead = line
inputFile.close()
from googlesearch import GoogleSearch
gs = GoogleSearch(lineToRead)
#gs.results_per_page = 5
#results = gs.get_results()
for i in gs.top_urls():
print i # check to make sure this is printing out url's
compare2 = i
if compare in compare2: # compare the two url's
dictionary[str(lineToRead)].append(str(i)) #write out query string to dictionary key & append the urls
for i in dictionary:
print i
outputFile.write(str(i))
for j in dictionary[i]:
print j
outputFile.write(str(j))
#outputFile.write(str(i)) #write results for the query string to the results file.
#to check if hash works print key /n print values /n print : /n print /n
#-----------------------------------------------------------------------------
Jeremy Banks is right. If you write dictionary[str(lineToRead)].append(str(i)) without first initializing a value for dictionary[str(lineToRead)] you will get an error.
It looks like you have an additional bug. The value of lineToRead will always be mouse, since you have already looped through and closed your input file before searching for anything. Likely, you want to loop thru every word in inputFile (i.e. cat, dog, bird, mouse)
To fix this, we can write the following (assuming you want to keep a list of query strings as values in the dictionary for each search term):
for line in inputFile.read().splitlines(): # loop through each line in input file
lineToRead = line
dictionary[str(lineToRead)] = [] #initialize to empty list
for i in gs.top_urls():
print i # check to make sure this is printing out url's
compare2 = i
if compare in compare2: # compare the two url's
dictionary[str(lineToRead)].append(str(i)) #write out query string to dictionary key & append the urls
inputfile.close()
You can delete the for loop you wrote for 'testing' the inputFile.

Categories

Resources