Storing Data in the correct way in Python - python

In Keywords.txt I have these words and their 'values':
alone,1
amazed,10
amazing,10
bad,1
best,10
better,7
excellent,10
These are some of the keywords and their 'values' that I need to store in a data structure, a list. Each line will be later used to access/extract the word and its 'value'.
The list I made in a while loop was:
line = KeywordFile.readline()
while line != '':
line=KeywordFile.readline()
line = line.rstrip()
And I tried to convert it to a list form by doing this:
list=[line]
However, when I print the list, I get this:
['amazed,10']
['amazing,10']
['bad,1']
['best,10']
['better,7']
['excellent,10']
I don't think that I'll be able to extract my 'values' from the lists that easy if they are inside quotation marks.
I'm looking to get lists in this format:
['amazed',10]
['amazing',10]
['bad',1]
Thanks in advance!

You want to split the line into a list based on the string delimiter ','.
list = line.split(',')
Then you want to convert the second value in each list into an int.
list[1] = int(list[1])

put line amazed,10 into sample.csv
import csv
with open('sample.csv') as csvfile:
inputs = csv.reader(csvfile, delimiter=',')
for row in inputs:
row = [int(x) if x.isdigit() else x for x in row] # convert numerical string to int which would remove quotes
print(row)
output:
['amazed', 10]

Related

Append each line of a text as a element of a nested list

I've got a txt file that I want to read and append to a list.
This is what I have so far of my code:
for line in open(filename):
items = line.rstrip('\r\n').split('\t') # strip new-line characters and split on column delimiter
items = [item.strip() for item in items] # strip extra whitespace off data items
F.append(items)
So far so good. The issue here is that I want each line to be appended to a list with the following structure:
F = [(4,3050),(1,4000),]
How do I make it read the file to append to that specific structure?
Solution
Firstly, be sure you are using tabulations in your text file. According to your screenshot, it seems that your are using a space.
Secondly, to store a tuple rather than a list in your list F use F.append(tuple(items)).
Final code
F = []
for line in open(filename):
items = line.rstrip('\r\n').split('\t')
items = [item.strip() for item in items]
F.append(tuple(items)) # Add tuple() here
print(F)
Output
[('4', '3050'), ('1', '4000')]
You can also achieve this using pandas,
import pandas as pd
df = pd.read_csv('csp_test.txt', sep=" ", header = None)
list(df.itertuples(index=False, name=None))
If you want a one liner,
import pandas as pd
list(pd.read_csv('csp_test.txt', sep=" ", header = None).itertuples(index=False, name=None))
Either way, you will get the following output,
[(4, 3050), (1, 4000)]
This way, you don't have to worry about splitting your fields or performing any strip operations.

Take list of strings and split it based on commas into 2d list

So far I have the the following code:
def display2dlist(data):
for i in data: #loops through list
lines = i.strip() #removing leading and trailing characters
split_lines = lines.split(", ") #splitting lines by ','
print(split_lines) #prints list
with open("python.csv", "r") as infile:
data = infile.readlines()[0:] #reads lines from file
#print(data)
display2dlist(data) #calling the function display2dlist with data list as parameter
It reads lines from a csv file and saves each line as a list.
It outputs this:
['2016,London,10']
['2017,Tokyo,11']
['2018,Toronto,12']
['2018,Dubai,23']
How would I make it so instead of saving each line as just one big string. It splits the lines at every comma and saves multiple values. So it should look like this:
['2016','London','10']
['2017','Tokyo','11']
['2018','Toronto','12']
['2018','Dubai','23']
For example in my current code:
data[0][0]= '2'
data[0][1]= '0' #it is treating it as a big string and just going letter by letter
data[0][2]= '1'
data[1][0]= '2'
data[1][1]= '0'
data[1][2]= '1'
I want it so when I execute the previous code the output is:
data[0][0]='2016'
date[0][1]='London'
data[0][2]='10'
data[1][0]='2017'
data[1][1]='Tokyo'
data[1][2]='11'
Remove extra space in split method and use thos function to get nested list:
def get_2d(data):
listdata = []
for i in data:
split_lines = i.strip().split(",") # not ", "
listdata.append(split_lines)
return listdata
...
with open("python.csv", "r") as infile:
data = infile.readlines()[0:]
print(get_2d (data))
def split_data(data): #data given as list
return [i.split(',') for i in data] #if data is not in list, wrap this in []
#example:
my_data = ['2020,10,10']
split_data(my_data)
#returns
[['2020', '10', '10']]

Python, check strings for values and remove other characters

f= open("new_sample.txt","r")
for ranges_in_file in f:
if(ranges_in_file.find('ranges:')!= -1):
new_data = ranges_in_file.split(" ")
print('success')
hi guys, currently i am reading a .txt file line by line to find for a certain value. I am able to find the line. For example, ranges: [1.3,1.9,2.05,inf,1.64] How do i store the certain line into a list and after that, remove any excess characters in the line such as the word "ranges" and "inf"?
Given you have read the lines of a file and can get a list,
which is like
ranges_in_file = [1.3,1.9,2.05,math.inf,1.64]
you can make a comprehension over the things you need/do not need:
wanted = [x for x in ranges_in_file if x not in [math.inf, "range"] ]
You can split the variableranges and the list using a,b=ranges_in_file.split('=')
b.strip()
So b contains your required list as a string. Use c=list(b[1:len(b)-1].split(',')) to convert it into list. Then you can iterate over list and discard any values you don't want. (Remember now all entries of list are strings!)
For new_sample.txt:
ranges: [1.3,1.9,2.05,inf,1.64]
you can split data with delimiter space and ", ":
f = open("new_sample.txt", "r")
file = f.read().splitlines()
f.close()
result = []
for i in file:
b = i.split(" ")
if b[0] == "ranges:":
temp = b[1][1:-1].split(",")
for j in temp:
if j not in ["inf"]:
result.append(j)
print(result)
OUTPUT: ['1.3', '1.9', '2.05', '1.64']
If your lines always look like this, so the start with "ranges: ", followed by a list of values, and the only thing you want to remove is inf, you can turn it into a list of floats easily using a mapfunction like this:
line = "ranges: [1.3,1.9,2.05,inf,1.64]"
values = list(map(float, [x.strip('[]') for x in (line.split(' ')[1]).split(',') if 'inf' not in x]))
Output:
[1.3, 1.9, 2.05, 1.64] # list of float values
You can then apply this to every line of the file that starts with 'ranges:', which will give you a list of the individual lines value lists. Notice the use of with open(..., which is safer to use for files in general, because the file will always be closed properly no matter what happens.
values = []
with open('new_sample.txt', 'r') as f:
for line in f.readlines():
if line.startswith('ranges:'):
line_values = list(map(float, [x.strip #.... and so on, see above
values.append(line_values)
But if your lines can be different, a more general approach is needed.

Reading a specific column of a text file into a list (python 3.6.3)

I know there's a million questions on this but I couldn't find one that matches what I'm looking for. Let's say I have a text file like this:
1 34
2 65
3 106
And I want to scan this file and read only the second column such that data=[34 65 106], how might I go about this? Further, if I wanted to make this program able to read any length dataset and any specific column input by the user. I can do most things in simple python but reading files eludes me.
pandas is a useful library for tasks such as this:
import pandas as pd
df = pd.read_csv('file.txt', header=None, delimiter=r"\s+")
lst = df.iloc[:, 1].tolist()
Solution
Sound like the case for a small helper function:
def read_col(fname, col=1, convert=int, sep=None):
"""Read text files with columns separated by `sep`.
fname - file name
col - index of column to read
convert - function to convert column entry with
sep - column separator
If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
"""
with open(fname) as fobj:
return [convert(line.split(sep=sep)[col]) for line in fobj]
res = read_col('mydata.txt')
print(res)
Output:
[34, 65, 106]
If you want the first column, i.e. at index 0:
read_col('mydata.txt', col=0)
If you want them to be floats:
read_col('mydata.txt', col=0, convert=float)
If the columns are separated by commas:
read_col('mydata.txt', sep=',')
You can use any combination of these optional arguments.
Explanation
We define a new function with default parameters:
def read_col(fname, col=1, convert=int, sep=None):
This means you have to supply the file fname. All other arguments are optional and the default values will be used if not provide when calling the function.
In the function, we open the file with:
with open(fname) as fobj:
Now fobj is an open file object. The file will be closed when we de-dent, i.e. here when we end the function.
This:
[convert(line.split(sep=sep)[col]) for line in fobj]
creates a list by going through all lines of the file. Each line is split at the separator sep. We take only the value for the column with index col. We also convert the value in the datatype of convert, i.e. into an integer per default.
Edit
You can also skip the first line in the file:
with open(fname) as fobj:
next(fobj)
return [convert(line.split(sep=sep)[col]) for line in fobj]
Or more sophisticated as optional argument:
def read_col(fname, col=1, convert=int, sep=None, skip_lines=0):
# skip first `skip_lines` lines
for _ in range(skip_lines):
next(fobj)
with open(fname) as fobj:
return [convert(line.split(sep=sep)[col]) for line in fobj]
You an use a list comprehension:
data = [b for a, b in [i.strip('\n').split() for i in open('filename.txt')]]
You will first need to get list of all lines via
fileobj.readlines()
Then you can run a for loop to iterate through the lines one by one , for each line you can split it by char (" ")
Then in the same for loop you can add the second index of split result to a existing list which will be your final result
a=fil.readlines()
t=[]
for f in a:
e=f.split(" ")
t.append(e[1])
Is the file delimited?
You'll want to first open the file:
with open('file.txt', 'r') as f:
filedata = f.readlines()
Create a list, loop through the lines and split each line into a list based on your delimiter, and then append the indexed item in the list to your original list.
data = []
for line in filedata:
columns = line.split('*your delimiter*')
data.append(columns[1])
Then the data list should contain what you want.

python: searching a text file for values in a list

im trying to search a large text file for a list of words. rather than running a command over and over for each word, I thought a list would be easier but I'm not sure how to go about it. the script below more or less work with the string value but I would like to replace the 'string' below with every value for the 'dict' list.
import csv
count = 0
dic = open('dictionary','r') #changed from "dict" in original post
reader = csv.reader(dic)
allRows = [row for row in reader]
with open('bigfile.log','r') in inF:
for line in inF:
if 'string' in line: #<---replace the 'string' with dict values
count += 1
count
Convert your file to a set instead:
with open('dictionary','r') as d:
sites = set(l.strip() for l in d)
Now you can do efficient membership tests per line, provided you can split your lines:
with open('bigfile.log','r') as inF:
for line in inF:
elements = line.split()
if sites.intersection(elements):
count += 1

Categories

Resources