I have a directory of files entitled 45-1.txt 1-17.txt etc.. basically they're 2 numbers seperated by a '-' with .txt at the end.
And i have a dataset that looks like this but has thousands of lines:
values/test/10/blueprint-0.png,2089.0,545.0,2100.0,546.0
values/test/10/blueprint-0.png,2112.0,545.0,2136.0,554.0
values/test/45/blueprint-1.png,112.0,45.0,36.0,654.0
The values that i care about in these lines are the first 2 numbers of each line, so 10-0, 10-0,45-1 etc..
what i want to do is to copy the lines that have the 2 numbers let's say 10-0 as a part of the name of 1 of the previous files, in this example 45-1 should be copied.
My approach:
import os,csv,re
my_dict = {}
source_dir = '/home/ubuntu/Desktop/EAST/testing_txts/'
for element in os.listdir(source_dir):
my_dict[element] = ''
# print(my_dict)
with open('/home/ubuntu/Desktop/EAST/ground_truth.txt') as f:
reader = csv.reader(f)
for key in my_dict:
for filename, *numbers in reader:
k1, k2 = re.findall(r'\d+', filename)
k3,k4 = re.findall(r'\d+', key)
if k3 == k1 and k2 == k4:
my_dict[key].append(filename)
To explain what i did a bit, i read all the name files in my directory and made them keys in a dictionary, then will read my file line by line for a specific key, if i find a similar line exists i will append the entire line to the specific dictionary key so assuming we have in the first directory 25-1.txt , 45-1.txt and 1-0.txt, and in the other file i have :
values/test/10/blueprint-0.png,2089.0,545.0,2100.0,546.0
values/test/10/blueprint-0.png,2112.0,545.0,2136.0,554.0
values/test/45/blueprint-1.png,112.0,45.0,36.0,654.0
values/test/45/blueprint-1.png,2.0,5.0,6.0,54.0
the end result will be 3 keys only 45-1 having elements in it and the elements are values/test/45/blueprint-1.png,112.0,45.0,36.0,654.0 and values/test/45/blueprint-1.png,2.0,5.0,6.0,54.0 (list of elements) the issue that i had with my code above is that i can't append the full sentence properly and get my keys with elements i get the error can't use append with strings and when i used my_dict[key] =filename to test knowing that it wrong and overwrites, only my first key had any element in it the rest were empty knowing they should exist as well.
Edit:
After fixing a list issue after a helpful answer and i did some quick adjustments my code became:
import os,csv,re
my_dict = {}
source_dir = '/home/ubuntu/Desktop/EAST/testing_txts/'
for element in os.listdir(source_dir):
my_dict[element] = []
# print(my_dict)
with open('/home/ubuntu/Desktop/EAST/ground_truth.txt') as f:
reader = csv.reader(f)
for key in my_dict:
for filename in reader:
print(filename)
k = []
k.append(re.findall(r'\d+', str(filename)))
k1,k2 = k[0][0],k[0][1]
k3,k4 = re.findall(r'\d+', key)
if k3 == k1 and k2 == k4:
my_dict[key].append(filename)
print(my_dict)
However my main issue of not every key is getting the elements persists as many keys stay empty.
for element in os.listdir(source_dir):
my_dict[element] = ''
You have initialized your my_dict values to be string. Hence when you use append it will create AttributeError. Because you can't append to a string
Approach 1 is to mention the values are a list and then join them as a string when reading it. append will not throw an error in this case
for element in os.listdir(source_dir):
my_dict[element] = []
Approach 2 is to use string concatenation
my_dict[key] += filename
Issue 2
I am not really sure but guessing that it might be because of the looping over the dict.
with open('/home/ubuntu/Desktop/EAST/ground_truth.txt') as f:
reader = csv.reader(f)
for filename in reader:
print(filename)
k1,k2 = re.findall(r'\d+', str(filename)
my_dict[k1+"-"+k2].append(filename)
print(my_dict)
import os,csv,re
my_dict = {}
source_dir = 'source'
for element in os.listdir(source_dir):
my_dict[element] = []
# print(my_dict)
with open('readme.txt') as f:
reader = f.readlines()
for key in my_dict:
for line in reader:
k1= re.findall(r'\d+', line)
k1 = k1[0] + k1[1]
key_stripped = key.replace('-','').replace('.txt', '')
if k1 == key_stripped:
my_dict[key].append(line)
print(my_dict)
Related
I am trying to join two CSV files based on one common column.
I am reading the CSV file storing a list of tuples. My code:
def read_csv(path):
file = open(path, "r")
content_list = []
for line in file.readlines():
record = line.split(",")
for i in range(len(record)):
record[i] = record[i].replace("\n","")
content_list.append(tuple(record))
return content_list
a_list = read_csv("a.csv")
b_list = read_csv("b.csv")
This is giving me list with headers of CSV as first tuple in the list
a_list
[('user_id', 'activeFl'),
('80c611f1-532a-4f7d-aa80-f28b472c0dbe', 'True'),
('4d04ab57-1b50-4474-bd12-b2b16ed2cca3', 'True'),
('0f37a42a-a984-4402-97bd-0eac95fa95d1', 'True'),
('dbe15b19-0128-4e3a-a82b-c8154d272c18', 'True'), ......]
b_list
[('id','date','user_id','blockedFl','amount','type'),
('b7819826-6468-4416-9953-e739d8046b81','2021-04-23','18a382ef-bd38-4884-8bf','True,'9.04','6'), ....]
I would like to merge these two lists based on the user_id, but I am stuck at this point. What can I try next?
the O(N^2) solution is:
result = list()
for left in a_list[1:]:
for right in b_list[1:]:
if left[0] == right[0]:
result.append(right + left[1:])
break
O(N) using dictionary:
result =list()
b_dict = {x[0]: x for x in b_list[1:]}
for left in a_list[1:]:
if left[0] in b_dict:
result.append(b_dict.get(left[0]) + left[1:])
This is one approach using csv module and a dict
Ex:
import csv
def read_csv(path):
with open(path) as infile:
reader = csv.reader(infile)
header = next(reader)
content = {i[0]: i for i in reader} # UserID as key
return content
a_list = read_csv("a.csv")
b_list = read_csv("b.csv")
merge_data = {k: v + [a_list.get(k)] for k, v in b_list.items()}
print(merge_data) # OR print(list(merge_data.values()))
So I have a text file like this
123
1234
123
1234
12345
123456
You can see 123 appears twice so both instances should be removed. but 12345 appears once so it stays. My text file is about 70,000 lines.
Here is what I came up with.
file = open("test.txt",'r')
lines = file.read().splitlines() #to ignore the '\n' and turn to list structure
for appId in lines:
if(lines.count(appId) > 1): #if element count is not unique remove both elements
lines.remove(appId) #first instance removed
lines.remove(appId) #second instance removed
writeFile = open("duplicatesRemoved.txt",'a') #output the left over unique elements to file
for element in lines:
writeFile.write(element + "\n")
When I run this I feel like my logic is correct, but I know for a fact the output is suppose to be around 950, but Im still getting 23000 elements in my output so a lot is not getting removed. Any ideas where the bug could reside?
Edit: I FORGOT TO MENTION. An element can only appear twice MAX.
Use Counter from built in collections:
In [1]: from collections import Counter
In [2]: a = [123, 1234, 123, 1234, 12345, 123456]
In [3]: a = Counter(a)
In [4]: a
Out[4]: Counter({123: 2, 1234: 2, 12345: 1, 123456: 1})
In [5]: a = [k for k, v in a.items() if v == 1]
In [6]: a
Out[6]: [12345, 123456]
For your particular problem I will do it like this:
from collections import defaultdict
out = defaultdict(int)
with open('input.txt') as f:
for line in f:
out[line.strip()] += 1
with open('out.txt', 'w') as f:
for k, v in out.items():
if v == 1: #here you use logic suitable for what you want
f.write(k + '\n')
Be careful about removing elements from a list while still iterating over that list. This changes the behavior of the list iterator, and can make it skip over elements, which may be part of your problem.
Instead, I suggest creating a filtered copy of the list using a list comprehension - instead of removing elements that appear more than twice, you would keep elements that appear less than that:
file = open("test.txt",'r')
lines = file.read().splitlines()
unique_lines = [line for line in lines if lines.count(line) <= 2] # if it appears twice or less
with open("duplicatesRemoved.txt", "w") as writefile:
writefile.writelines(unique_lines)
You could also easily modify this code to look for only one occurrence (if lines.count(line) == 1) or for more than two occurrences.
You can count all of the elements and store them in a dictionary:
dic = {a:lines.count(a) for a in lines}
Then remove all duplicated one from array:
for k in dic:
if dic[k]>1:
while k in lines:
lines.remove(k)
NOTE: The while loop here is becaues line.remove(k) removes first k value from array and it must be repeated till there's no k value in the array.
If the for loop is complicated, you can use the dictionary in another way to get rid of duplicated values:
lines = [k for k, v in dic.items() if v==1]
I am new to python and not able to figure out how do i accomplish this.
Suppose file.txt contains following
iron man 1
iron woman 2
man ant 3
woman wonder 4
i want to read this file into dictionary in the below format
dict = { 'iron' : ['man', 'woman'], 'man' : ['ant'], 'woman' : ['wonder'] }
That is the last part in each line being omitted while writing to dictionary.
My second question is can i read this file to dictionary in a way such that
dict2 = { 'iron' : [('man', '1'), ('woman', '2')], 'man' : [('ant', '3')], 'woman' : [('wonder', '4')] } .
That is key iron will have 2 values but these 2 values being individual tuple.
Second question is for implementation of uniform cost search so that i can access iron child man and woman and cost for these children being 1 and 2
Thank you in advance
Here you go both the parts of your question...Being new to python just spend some time with it like your grilfriend and you will know how it behaves
with open ('file.txt','r') as data:
k = data.read()
lst = k.splitlines()
print lst
dic = {}
dic2 = {}
for i in lst:
p=i.split(" ")
if str(p[0]) in dic.keys():
dic[p[0]].append(p[2])
dic2[p[0]].append((p[1],p[2]))
else:
dic[p[0]] = [p[1]]
dic2[p[0]] = [(p[1],p[2])]
print dic
print dic2
you can use collections.defaultdict:
you 1st answer
import collections.defaultdict
my_dict = cillections.defaultdict(list)
with open('your_file') as f:
for line in f:
line = line.strip().split()
my_dict[line[0]].append(line[1])
print my_dict
Taking above example, you might able to solve 2nd question.
For the first question, what you want to do is read each line, split it using white spaces and use a simple if rule to control how you add it to your dictionary:
my_dict = {}
with open('file.txt', 'r') as f:
content = f.read()
for line in content.split('\n'):
item = line.split()
if item[0] not in my_dict:
my_dict[item[0]] = [item[1]]
else:
my_dict[item[0]].append(item[1])
The second question is pretty much the same, only with a slightly different assignment:
my_dict2 = {}
for line in content.split('\n'):
item = line.split()
if item[0] not in my_dict:
my_dict[item[0]] = [(item[1], item[2])]
else:
my_dict[item[0]].append((item[1], item[2]))
Say I have a file "stuff.txt" that contains the following on separate lines:
q:5
r:2
s:7
I want to read each of these lines from the file, and convert them to dictionary elements, the letters being the keys and the numbers the values.
So I would like to get
y ={"q":5, "r":2, "s":7}
I've tried the following, but it just prints an empty dictionary "{}"
y = {}
infile = open("stuff.txt", "r")
z = infile.read()
for line in z:
key, value = line.strip().split(':')
y[key].append(value)
print(y)
infile.close()
try this:
d = {}
with open('text.txt') as f:
for line in f:
key, value = line.strip().split(':')
d[key] = int(value)
You are appending to d[key] as if it was a list. What you want is to just straight-up assign it like the above.
Also, using with to open the file is good practice, as it auto closes the file after the code in the 'with block' is executed.
There are some possible improvements to be made. The first is using context manager for file handling - that is with open(...) - in case of exception, this will handle all the needed tasks for you.
Second, you have a small mistake in your dictionary assignment: the values are assigned using = operator, such as dict[key] = value.
y = {}
with open("stuff.txt", "r") as infile:
for line in infile:
key, value = line.strip().split(':')
y[key] = (value)
print(y)
Python3:
with open('input.txt', 'r', encoding = "utf-8") as f:
for line in f.readlines():
s=[] #converting strings to list
for i in line.split(" "):
s.append(i)
d=dict(x.strip().split(":") for x in s) #dictionary comprehension: converting list to dictionary
e={a: int(x) for a, x in d.items()} #dictionary comprehension: converting the dictionary values from string format to integer format
print(e)
I have a file comprising two columns, i.e.,
1 a
2 b
3 c
I wish to read this file to a dictionary such that column 1 is the key and column 2 is the value, i.e.,
d = {1:'a', 2:'b', 3:'c'}
The file is small, so efficiency is not an issue.
d = {}
with open("file.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
This will leave the key as a string:
with open('infile.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
You can also use a dict comprehension like:
with open("infile.txt") as f:
d = {int(k): v for line in f for (k, v) in [line.strip().split(None, 1)]}
def get_pair(line):
key, sep, value = line.strip().partition(" ")
return int(key), value
with open("file.txt") as fd:
d = dict(get_pair(line) for line in fd)
By dictionary comprehension
d = { line.split()[0] : line.split()[1] for line in open("file.txt") }
Or By pandas
import pandas as pd
d = pd.read_csv("file.txt", delimiter=" ", header = None).to_dict()[0]
Simple Option
Most methods for storing a dictionary use JSON, Pickle, or line reading. Providing you're not editing the dictionary outside of Python, this simple method should suffice for even complex dictionaries. Although Pickle will be better for larger dictionaries.
x = {1:'a', 2:'b', 3:'c'}
f = 'file.txt'
print(x, file=open(f,'w')) # file.txt >>> {1:'a', 2:'b', 3:'c'}
y = eval(open(f,'r').read())
print(x==y) # >>> True
If you love one liners, try:
d=eval('{'+re.sub('\'[\s]*?\'','\':\'',re.sub(r'([^'+input('SEP: ')+',]+)','\''+r'\1'+'\'',open(input('FILE: ')).read().rstrip('\n').replace('\n',',')))+'}')
Input FILE = Path to file, SEP = Key-Value separator character
Not the most elegant or efficient way of doing it, but quite interesting nonetheless :)
IMHO a bit more pythonic to use generators (probably you need 2.7+ for this):
with open('infile.txt') as fd:
pairs = (line.split(None) for line in fd)
res = {int(pair[0]):pair[1] for pair in pairs if len(pair) == 2 and pair[0].isdigit()}
This will also filter out lines not starting with an integer or not containing exactly two items
I had a requirement to take values from text file and use as key value pair. i have content in text file as key = value, so i have used split method with separator as "=" and
wrote below code
d = {}
file = open("filename.txt")
for x in file:
f = x.split("=")
d.update({f[0].strip(): f[1].strip()})
By using strip method any spaces before or after the "=" separator are removed and you will have the expected data in dictionary format
import re
my_file = open('file.txt','r')
d = {}
for i in my_file:
g = re.search(r'(\d+)\s+(.*)', i) # glob line containing an int and a string
d[int(g.group(1))] = g.group(2)
Here's another option...
events = {}
for line in csv.reader(open(os.path.join(path, 'events.txt'), "rb")):
if line[0][0] == "#":
continue
events[line[0]] = line[1] if len(line) == 2 else line[1:]