Python and appending items to text and excel file - python

I am wondering how I can append items in a list all into specifics rows because for some reason my python code does some whacked stuff.
Yvalues = [1, 2, 3, 4, 5]
open("file.csv", "w")
file_out = open('file.csv','wb')
mywriter=csv.writer(file_out)
for item in Yvalues:
mywriter.writerow(Yvalues)
file_out.close()
When I open my csv file I get this:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
I don't want that layout, how can I make it so that it goes to a specific row all going down like this:
1
2
3
4
5

The correct way is as follows:
Yvalues = [1, 2, 3, 4, 5]
file_out = open('file.csv','wb')
mywriter=csv.writer(file_out, delimiter = '\n')
mywriter.writerow(Yvalues)
file_out.close()
This will give you:
1
2
3
4
5

You need to get your values correctly mapped. Try this version:
y_values = [[1], [2], [3], [4], [5]]
with open("file.csv", "w") as file_out:
mywriter=csv.writer(file_out)
mywriter.writerows(y_values)
You were using writerow which will take the given value is the contents for a row. Since you are passing it the same list every time, its assuming you want five columns.
In my version, I am using writerows, and my list contains a list representing each row. Since there is only one item in each inner list, that's what will be written.

Related

Copy Each Line of a List into a Separate Text File

I have two lists of equal length. One contains the names of the files I would like to create while the other is a 2-d list that has data I would like to copy into a text file with a name from the list. I want each element from the 2D list to have its own separate text file. The example code is as follows:
Source Code:
example_data = [[1,2,3],[4,5,6],[7,8,9]]
example_names = ['name1.txt', 'name2.txt', 'name3.txt']
for name in example_names:
for ele in example_data:
with open(name, 'w') as f:
f.writelines('0 ' + str(ele).replace('[',replace(']', '').replace(',', ''))
Current Output:
name1.txt,
data within file: 0 7 8 9
name2.txt,
data within file: 0 7 8 9
name3.txt,
data within file: 0 7 8 9
Expected output:
name1.txt,
data within file: 0 1 2 3
name2.txt,
data within file: 0 4 5 6
name3.txt,
data within file: 0 7 8 9
Logic
You can use zip to get both the element side by side and use str.join to convert list to str and as it's list of int you need to convert every individual element to str type.
Solution
Source Code
example_data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
example_names = ["name1.txt", "name2.txt", "name3.txt"]
for file_name, data in zip(example_names, example_data):
with open(file_name, "w") as f:
f.write(f"0 {' '.join([str(x) for x in data])}")
Output
name1.txt
0 1 2 3
name2.txt
0 4 5 6
name3.txt
0 7 8 9
The problem is that you're looping over all data for each file. Instead, you want to get only the data associated with that file. To do this, you can use enumerate() to get a list index. Also, you do not need f.writelines() because you're only writing one line. Instead, use f.write(). Here's the code you're looking for:
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
files = ['f1.txt', 'f2.txt', 'f3.txt']
for i, file in enumerate(files):
with open(file, 'w') as f:
f.write('0 ' + str(data[i]).replace('[', '').replace(']', '').replace(',', ''))

How to read after a space until the next space in Python

I have this program:
import sys
import itertools
from itertools import islice
fileLocation = input("Input the file location of ScoreBoard: ")
input1 = open(fileLocation, "rt")
amountOfLines = 0
for line in open('input1.txt').readlines( ):
amountOfLines += 1
timestamps = [line.split(' ', 1)[0][0:] for line in islice(input1, 2, amountOfLines)]
teamids = [line.split(' ', 1)[0][0:] for line in islice(input1, 2, amountOfLines)]
print(teamids)
and this text file:
1
5 6
1 5 1 5 0
1 4 1 4 1
2 1 2 1 1
2 2 3 1 1
3 5 2 1 1
4 4 5 4 1
For teamids, I want it to start reading after the first space and to the next space, starting from the second line which, I have already achieved but don't get how to start reading after the first space to the next. For timestamps i have managed this but only starting from the first character to the first space and don't know how to do this for teamids. Much help would be appreciated
Here's one suggestion showcasing a nice use case of zip to transpose your array:
lines = open(fileLocation, 'r').readlines()[2:]
array = [[int(x) for x in line.split()] for line in lines]
transpose = list(zip(*filter(None, array)))
# now we can do this:
timestamps = transpose[0] # (1, 1, 2, 2, 3, 4)
teamids = transpose[1] # (5, 4, 1, 2, 5, 4)
This exploits the fact that zip(*some_list) returns the transpose of some_list.
Beware of the fact that the number of columns you get will be equal to the length of the shortest row. Which is one reason why I included the call to filter to remove empty rows caused by empty lines.

Calculate number of items in one list are in another

Let's say I have two very large lists (e.g. 10 million rows) with some values or strings. I would like to figure out how many items from list1 are in list2.
As such this can be done by:
true_count = 0
false_count = 0
for i, x in enumerate(list1):
print(i)
if x in list2:
true_count += 1
else:
false_count += 1
print(true_count)
print(false_count)
This will do the trick, however, if you have 10 million rows, this could take quite some time. Is there some sweet function I don't know about that can do this much faster, or something entirely different?
Using Pandas
Here's how you will do it using Pandas dataframe.
import pandas as pd
import random
list1 = [random.randint(1,10) for i in range(10)]
list2 = [random.randint(1,10) for i in range(10)]
df1 = pd.DataFrame({'list1':list1})
df2 = pd.DataFrame({'list2':list2})
print (df1)
print (df2)
print (all(df2.list2.isin(df1.list1).astype(int)))
I am just picking 10 rows and generating 10 random numbers:
List 1:
list1
0 3
1 5
2 4
3 1
4 5
5 2
6 1
7 4
8 2
9 5
List 2:
list2
0 2
1 3
2 2
3 4
4 3
5 5
6 5
7 1
8 4
9 1
The output of the if statement will be:
True
The random lists I checked against are:
list1 = [random.randint(1,100000) for i in range(10000000)]
list2 = [random.randint(1,100000) for i in range(5000000)]
Ran a test with 10 mil. random numbers in list1, 5 mil. random numbers in list2, result on my mac came back in 2.207757880999999 seconds
Using Set
Alternate, you can also convert the list into a set and check if one set is a subset of the other.
set1 = set(list1)
set2 = set(list2)
print (set2.issubset(set1))
Comparing the results of the run, set is also fast. It came back in 1.6564296570000003 seconds
You can convert the lists to sets and compute the length of the intersection between them.
len(set(list1) & set(list2))
You will have to use Numpy array to translate the lists into a np.array()
After that, both lists will be considered as np.array objects, and because they have only one dimension you can use np.intersect() and count the common items with .size
import numpy as np
lst = [1, 7, 0, 6, 2, 5, 6]
lst2 = [1, 8, 0, 6, 2, 4, 6]
a_list=np.array(lst)
b_list=np.array(lst2)
c = np.intersect1d(a_list, b_list)
print (c.size)

Matrix as a dictionary; is it safe?

I know that the order of the keys is not guaranteed and that's OK, but what exactly does it mean that the order of the values is not guaranteed as well*?
For example, I am representing a matrix as a dictionary, like this:
signatures_dict = {}
M = 3
for i in range(1, M):
row = []
for j in range(1, 5):
row.append(j)
signatures_dict[i] = row
print signatures_dict
Are the columns of my matrix correctly constructed? Let's say I have 3 rows and at this signatures_dict[i] = row line, row will always have 1, 2, 3, 4, 5. What will signatures_dict be?
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
or something like
1 2 3 4 5
1 4 3 2 5
5 1 3 4 2
? I am worried about cross-platform support.
In my application, the rows are words and the columns documents, so can I say that the first column is the first document?
*Are order of keys() and values() in python dictionary guaranteed to be the same?
You will guaranteed have 1 2 3 4 5 in each row. It will not reorder them. The lack of ordering of values() refers to the fact that if you call signatures_dict.values() the values could come out in any order. But the values are the rows, not the elements of each row. Each row is a list, and lists maintain their order.
If you want a dict which maintains order, Python has that too: https://docs.python.org/2/library/collections.html#collections.OrderedDict
Why not use a list of lists as your matrix? It would have whatever order you gave it;
In [1]: matrix = [[i for i in range(4)] for _ in range(4)]
In [2]: matrix
Out[2]: [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]
In [3]: matrix[0][0]
Out[3]: 0
In [4]: matrix[3][2]
Out[4]: 2

Assigning list to an array in python

I have a data file "list_2_array.dat" as shown below. First, I want to read it and then I want to take control over fourth column elements for further mathematical operations.
1 2 3 10
4 5 6 20
1 3 5 30
2 1 4 40
3 2 3 50
I tried following piece of code
b_list = []
file=open('/path_to_file/list_2_array.dat', 'r')
m1=[(i.strip()) for i in file]
for j in m1:
b_list.append(j.replace('\n','').split(' '))
for i in range(5):
print b_list[i][3]
which gives output
10
20
30
40
50
I don't want to print the elements, I am interested in first assigning the fourth column elements to a 1-D array so that I can easily process them later. I tried several ways to do this,as one shown below, but did not work
import numpy as np
for i in range(5):
arr = array (b_list[i][3])
f=open('/path_to_file/list_2_array.dat', 'r')
l = []
for line in f.readlines():
l.append(int(line.strip().split()[-1]))
array=np.array(l)
or more pythonic I guess..:
f=open('/path_to_file/list_2_array.dat', 'r')
l = [int(line.strip().split()[-1]) for line in f.readlines()]
array=np.array(l)
data = """1 2 3 10
4 5 6 20
1 3 5 30
2 1 4 40
3 2 3 50"""
fourth = [int(line.split()[3]) for line in data.split("\n")]
print(fourth)
Output:
[10, 20, 30, 40, 50]
def get_last_col(file):
last_col = [int(line.split()[-1]) for line in open(file)]
return last_col
first of all, never assign variable names like str, file, int.
next you were nearly there.
b_list = []
c_list = []
file=open('/path_to_file/list_2_array.dat', 'r')
m1=[(i.strip()) for i in file]
for j in m1:
b_list.append(j.replace('\n','').split(' '))
for i in range(5):
c_list.append(b_list[i][3])
print c_list
I don't really like this solution so I changed #user2994666 his/her solution:
file_location = "/path_to_file/list_2_array.dat"
def get_last_col(file_location):
last_col = [int(line.split()[-1]) for line in open(file_location)]
return last_col
print get_last_col(file_location)
Note that the [-1] solution yields the last column, in your case this gives no problem. In case you have a file with 5 columns and you are still interested in the 4th, you use [3] instead of [-1]

Categories

Resources