I have a .txt (notepad) file called Log1. It has the following saved in it: [1, 1, 1, 0]
When I write a program to retrieve the data:
Log1 = pd.read_csv('Path...\\Log1.txt')
Log1 = list(Log1)
print(Log1)
It prints: ['[1', ' 1', ' 1.1', ' 0]']
I dont understand where the ".1" is coming from on the third number. Its not in the text file, it just adds it.
Funny enough if I change the numbers in the text file to: [1, 0, 1, 1]. It does not add the .1 It prints ['[1', ' 0', ' 1', ' 1]']
Very odd why its acting this way if anyone has an idea.
Well, I worked out some other options as well, just for the record:
Solution 1 (plain read - this one gets a list of string)
log4 = []
with open('log4.txt') as f:
log4 = f.readlines()
print(log4)
Solution 2 (convert to list of ints)
import ast
with open('log4.txt', 'r') as f:
inp = ast.literal_eval(f.read())
print(inp))
Solution 3 (old school string parsing - convert to list of ints, then put it in a dataframe)
with open('log4.txt', 'r') as f:
mylist = f.read()
mylist = mylist.replace('[','').replace(']','').replace(' ','')
mylist = mylist.split(',')
df = pd.DataFrame({'Col1': mylist})
df['Col1'] = df['Col1'].astype(int)
print(df)
Other ideas here as well:
https://docs.python-guide.org/scenarios/serialization/
In general the reading from the text file (deserializing) is easier if the text file is written in a good structured format in the first place - csv file, pickle file, json file, etc. In this case, using the ast.literal_eval() worked well since this was written out as a list using it's __repr__ format -- though honestly I've never done that before so it was an interesting solution to me as well :)
This should work. Can you please try this,
log2 = log1.values.tolist()
Output:
[['1'], ['1'], ['1'], ['0']]
Your data is not in a CSV format. In CSV you would rather have
1;1;0;1
or something similar.
If you have multiple lines like this, it might make sense to parse this as CSV, otherwise I'd rather parse it using a regexp and .split on the result.
Proposal: Add a bigger input example and your expected output.
I am trying to process a csv file and want to extract the entire row if it contains a string and add it to an another brand new list. But my approach is giving me all the rows which contain that string whereas I want the unique string row. Let me explain it with an example:
I have the following list of lists:
myList = [['abc', 1, 3, 5, 6], ['abcx', 5, 6, 8, 9], ['abcn', 7, 12, 89, 23]]
I want to get the whole list which has the string 'abc'. I tried the following:
newList = []
for temp in myList:
if 'abc' in temp:
newList.append(temp)
But this gives me all the values, as 'abc' is a substring of all the other strings too which are in the strings. What is a cleaner approach to solve this problem?
Update:
I have a huge CSV file, which I am reading line by line using readlines() and I want to find the line which has "abc" gene and shove the whole line into a list. But when I do if 'abc' in , I get all the other strings which also have "abc" as a substring. How can I ignore the substrings.
From your comment on the question, I think it is straight forward to use numpy and pandas if you want to process a csv file. Pandas has in-built csv reader and you can extract the row and convert into a list or a numpy array in a couple of lines with ease. Here's how I would do it:
import pandas
df = pandas.read_csv("your_csv")
#assuming you have column names.
x = df.loc[df['col_name'] == 'abc'].values.tolist() #this will give you the whole row and convert into a list.
Or
import numpy as np
x = np.array(df.loc[df['col_name'] == 'abc']) #gives you a numpy array
This gives you much more flexibility to do processing. I hope this helps.
It seems you want to append only if the string matches 'abc' and nothing else(e.g. true for 'abc, but false for 'abcx'). Is this correct?
If so, you need to make 2 corrections;
First, you need to index the list, currently temp is the entire list, but if you know the string will always be in position 0, index that in the if statement.(if you don't, either a nested for loop will work)
Second, you need to use '==' instead of 'in'. 'in' means that it can be a part of a larger string, whereas '==' must be an exact match.
newList = []
for temp in myList:
if temp[0] == 'abc':
newList.append(temp)
or
newList = [temp for temp in myList if temp[0] == 'abc']
Your code works, as others said it before me.
Part of your question was to get a cleaner code. Since you only want the sub-lists that contain your string, I would recommend to use filter:
check_against_string = 'abc'
newList = list(filter(lambda sub_list: check_against_string in sub_list, myList))
filter creates a list of elements for which a function returns true. It is exactly the code you wrote, but more pythonic!
I'm trying to extract some info from a file. The file has many lines like the one below
"names":["DNSCR"],"actual_names":["RADIO_R"],"castime":[2,4,6,8,10] ......
I want to search in each line for names and castime, if found I want to print the value in the brackets
the values in the brackets are changing in different line. for example in the above line names is DNSCR, and casttime is 2,3,6,8. but the length might
be different in next line
I have tried the following code but it will always give me 10 characters but I only need whatever in the bracket only.
c_req = 10
keyword = ['"names":','"castime":']
with open('mylogfile.log') as searchfile:
for line in searchfile:
for key in keywords:
left,sep,right = line.partition(key)
if sep:
print key + " = " + (right[:c_req])
This looks just like json, are there brackets around each line?
if so, the whole content is trivial to parse:
import json
test = '{"names":["DNSCR"],"actual_names":["RADIO_R"],"castime":[2,4,6,8,10]}'
result = json.loads(test)
print(result["names"], result["castime"])
You could also use a library like pandas to read the whole file into a dataframe if it matches a whole JSON file.
Use Regular Expression:
import re
# should contain all lines
lines = ['"names":["DNSCR"],"actual_names":["RADIO_R"],"castime":[2,4,6,8,10]']
# more efficient in large files
names_pattern = re.compile('"names":\["(\w+)"\]')
castime_pattern = re.compile('"castime":\[(.+)\],?')
names, castimes = list(), list()
for line in lines:
names.append(re.search(names_pattern, line).group(1))
castimes.append(
[int(num) for num in re.search(castime_pattern, line).group(1).split(',')]
)
add exception handling and file opening/reading
Given mylogfile.log:
"names":["DNSCR"],"actual_names":["RADIO_R"],"castime":[2,4,6,8,10]
"names":["FOO", "BAR"],"actual_names":["RADIO_R"],"castime":[1, 2, 3]
Using regular expressions and ast.literal_eval.
import ast
import re
keywords = ['"names":', '"castime":']
keywords_name = ['names', 'castime']
d = {}
with open('mylogfile.log') as searchfile:
for i, line in enumerate(searchfile):
d['line ' + str(i)] = {}
for key, key_name in zip(keywords, keywords_name):
d['line ' + str(i)][key_name] = ast.literal_eval(re.search(key + '\[(.*?)\]', line).group(1))
print(d)
#{ 'line 0': {'castime': (2, 4, 6, 8, 10), 'names': 'DNSCR'},
# 'line 1': {'castime': (1, 2, 3), 'names': ('FOO', 'BAR')}}
re.search(key + '\[(.*?)\]', line).group(1) will catch everything that is in between [] after your keys.
And ast.literal_eval() will transform remove usless quote and spaces in your string and automatically create tuples when needed.
I also used enumerate to keep track of which lines it gets in the log file.
I am new to python and want to split what I have read in from a text file into two specific parts. Below is an example of what could be read in:
f = ['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]
So what I want to achieve is to be able to execute the second part of the program is:
words = ['Cats','like','dogs','as','much','cats.']
numbers = [1,2,3,4,5,4,3,2,6]
I have tried using:
words,numbers = f.split("][")
However, this removes the double bracets from the two new variable which means the second part of my program which recreates the original text does not work.
Thanks.
I assume f is a string like
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
then we can find the index of ][ and add one to find the point between the brackets
i = f.index('][')
a, b = f[:i+1], f[i+1:]
print(a)
print(b)
output:
['Cats','like','dogs','as','much','cats.']
[1,2,3,4,5,4,3,2,6]
Another Alternative if you want to still use split()
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
d="]["
print f.split(d)[0]+d[0]
print d[1]+f.split(d)[1]
If you can make your file look something like this:
[["Cats","like","dogs","as","much","cats."],[1,2,3,4,5,4,3,2,6]]
then you could simply use Python's json module to do this for you. Note that the JSON format requires double quotes rather than single.
import json
f = '[["Cats","like","dogs","as","much","cats."],[1,2,3,4,5,4,3,2,6]]'
a, b = json.loads(f)
print(a)
print(b)
Documentation for the json library can be found here: https://docs.python.org/3/library/json.html
An alternative to Patrick's answer using regular expressions:
import re
data = "f = ['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
pattern = 'f = (?P<words>\[.*?\])(?P<numbers>\[.*?\])'
match = re.match(pattern, data)
words = match.group('words')
numbers = match.group('numbers')
print(words)
print(numbers)
Output
['Cats','like','dogs','as','much','cats.']
[1,2,3,4,5,4,3,2,6]
If I understand correctly, you have a text file that contains ['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6] and you just need to split that string at the transition between brackets. You can do this with the string.index() method and string slicing. See my console output below:
>>> f = open('./catsdogs12.txt', 'r')
>>> input = f.read()[:-1] # Read file without trailing newline (\n)
>>> input
"['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
>>> bracket_index = input.index('][') # Get index of transition between brackets
>>> bracket_index
41
>>> words = input[:bracket_index + 1] # Slice from beginning of string
>>> words
"['Cats','like','dogs','as','much','cats.']"
>>> numbers = input[bracket_index + 1:] # Slice from middle of string
>>> numbers
'[1,2,3,4,5,4,3,2,6]'
Note that this will leave you with a python string that looks visually identical to a list (array). If you needed the data represented as python native objects (i.e. so that you can actually use it like a list), you'll need to use some combination of string[1:-1].split(',') on both strings and list.map() on the numbers list to convert the numbers from strings to numbers.
Hope this helps!
Another thing you can do is first replace ][ with ]-[ and then do a split or partition using - but i will suggest you to do split as we don't want that delimiter.
SPLIT
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
f = f.replace('][',']-[')
a,b = f.split('-')
Output
>>> print(a)
['Cats','like','dogs','as','much','cats.']
>>> print(b)
[1,2,3,4,5,4,3,2,6]
PARTITION
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
f = f.replace('][',']-[')
a,b,c = f.partition('-')
Output
>>> print(a)
['Cats','like','dogs','as','much','cats.']
>>> print(c)
[1,2,3,4,5,4,3,2,6]