I have a data frame (possibly a list):
A = ['01', '20', '02', '25', '26']
B = ['10', '13', '14', '64', '32']
I would like to compare list 'a' with list 'b' in the following way:
As you can see, strings of numbers in the left column with strings in the right column are compared. Combined are strings that have the same boundary digit, one of which is removed during merging (or after). Why was the string '010' removed? Because each digit can occur only once.
You can perform a couple of string slicing operations and then merge on the common digit.
a
A
0 01
1 20
2 02
3 25
4 26
b
B
0 10
1 13
2 14
3 64
4 32
a['x'] = a.A.str[-1]
b['x'] = b.B.str[0]
b['B'] = b.B.str[1:]
m = a.merge(b)
You could also do this in a single line with assign, without disrupting the original dataframes:
m = a.assign(x=a.A.str[-1]).merge(b.assign(x=b.B.str[0], B=b.B.str[1:]))
For uniques, you'll need to convert to set and check its length.
v = (m['A'] + m['B'])
v.str.len() == v.apply(set).str.len()
0 False
1 True
2 True
3 True
dtype: bool
v[v.str.len() == v.apply(set).str.len()].tolist()
['013', '014', '264']
Something you should be aware of is that you're actually passing integers, not strings. That means that A = [01, 20, 02, 25, 26] is the same as A = [1, 20, 2, 25, 26]. If you always know that you're going to be working with integers <= 99, however, this won't be an issue. Otherwise, you should use strings instead of integers, like A = ['01', '20', '02', '25', '26']. So the first thing you should do is convert the lists to lists of strings. If you know all of the integers will be <= 99, you can do so like this:
A = ['%02d' % i for i in A]
B = ['%02d' % i for i in B]
(you could also name these something different if you want to preserve the integer lists). Then here would be the solution:
final = []
for i in A:
for j in B:
if i[-1] == j[0]:
final.append(i + j[1:])
Related
I’m new to python and would like to do a simple function. I’d like to read the input array and if the value is more than 4 digits, to then split it then print the first value then the second value.
I’m having issues splitting the number and getting rid of 0’s inbetween; so for example 1006, would become 1, 6.
Input array:
a = [ 1002, 2, 3, 7 ,9, 15, 5992]
Desired output in console:
1, 2
2
3
7
9
15
59,92
You can abstract the splitting into a function and then use a list comprehension to map that function over the list. The following can be tweaked (it matches more of what you had before one of your edits). It can be tweaked of course:
def split_num(n):
s = str(n)
if len(s) < 4:
return 0, n
else:
a,b = s[:2], s[2:]
if a[1] == '0': a = a[0]
return int(a), int(b)
nums = [1002, 2, 3, 7 ,9, 15, 5992]
result = [split_num(n) for n in nums]
for a,b in result:
print(a,b)
Output:
1 2
0 2
0 3
0 7
0 9
0 15
59 92
If you just want a list of the non-zero digits in the original list, you can use this:
a = [ 1002, 2, 3, 7 ,9, 15, 5992]
strings = [str(el) for el in a]
str_digits = [char for el in strings for char in el if char != '0']
and if you want the digits as ints, you can do:
int_digits = [int(el) for el in str_digits]
or go straight to
int_digits = [int(char) for el in strings for char in el if char != '0']
I'm not sure what the logic is behind your desired output is, though, so if this isn't helpful I'm sorry.
Trying to get an integer in a multi-line string with its value same as its index. Here's my trial.
table='''012
345
678'''
print (table[4])
if i execute the above, i will get a output of 3 instead of a 4.
I am trying to get number i with print(table[i])
What is the simplest way of getting the number corresponding to table[i] without using list, because i have to further use while loops later to replace values of the table and using lists would be very troublesome. Thanks.
Your string contains whitespaces (carriage return and mabye linefeed) at position 4 (\n in linux, \n\r on 4+5 on windows) - you can clean your text by removing them:
table='''012
345
678'''
print (table[4]) #3 - because [3] == \n
print(table.replace("\n","")[4]) # 4
You can view all characters in your "table" like so:
print(repr(table))
# print the ordinal value of the character and the character if a letter
for c in table:
print(ord(c), c if ord(c)>31 else "")
Output:
'012\n345\n678'
48 0
49 1
50 2
10
51 3
52 4
53 5
10
54 6
55 7
56 8
On a sidenote - you might want to build a lookup dict if your table does not change to skip replacing stuffin your string all the time:
table='''012
345
678'''
indexes = dict( enumerate(table.replace("\n","")))
print(indexes)
Output:
{0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7: '7', 8: '8'}
so you can do index[3] to get the '3' string
I have a variable that is mixed with letters and numbers. The letters range from A:Z and the numbers range from 2:8. I want to re-code this variable so that it is all numeric with the letters A:Z now becoming numbers 1:26 and the numbers 2:8 becoming numbers 27:33.
For example, I would like this variable:
Var1 = c('A',2,3,8,'C','W',6,'T')
To become this:
Var1 = c(1,27,28,33,3,23,31,20)
In R I can do this using 'match' like this:
Var1 = as.numeric(match(Var1, c(LETTERS, 2:8)))
How can I do this using python? Pandas?
Thank you
Make a dictionary and map the values:
import string
import numpy as np
dct = dict(zip(list(string.ascii_uppercase) + list(np.arange(2, 9)), np.arange(1, 34)))
# If they are strings of numbers, not integers use:
#dct = dict(zip(list(string.ascii_uppercase) + ['2', '3', '4', '5', '6', '7', '8'], np.arange(1, 34)))
df.col_name = df.col_name.map(dct)
An example:
import pandas as pd
df = pd.DataFrame({'col': [2, 4, 6, 3, 5, 'A', 'B', 'D', 'F', 'Z', 'X']})
df.col.map(dct)
Outputs:
0 27
1 29
2 31
3 28
4 30
5 1
6 2
7 4
8 6
9 26
10 24
Name: col, dtype: int64
i think that could help you
Replacing letters with numbers with its position in alphabet
then you just need to apply on you df column
dt.Var1.apply(alphabet_position)
you can also try this
for i in range(len(var1)):
if type(var1[i]) == int:
var1[i] = var1[i] + 25
else:
var1[i] = ord(var1[i].lower()) - 96
This question already has answers here:
How to sort python list of strings of numbers
(4 answers)
Closed 6 years ago.
I have a file with 4 column data, and I want to prepare a final output file which is sorted by the first column. The data file (rough.dat) looks like:
1 2 4 9
11 2 3 5
6 5 7 4
100 6 1 2
The code I am using to sort by the first column is:
with open('rough.dat','r') as f:
lines=[line.split() for line in f]
a=sorted(lines, key=lambda x:x[0])
print a
The result I am getting is strange, and I think I'm doing something silly!
[['1', '2', '4', '9'], ['100', '6', '1', '2'], ['11', '2', '3', '5'], ['6', '5', '7', '4']]
You may see that the first column sorting is not done as per ascending order, instead, the numbers starting with 'one' takes the priority!! A zero after 'one' i.e 100 takes priority over 11!
Strings are compared lexicographically (dictionary order):
>>> '100' < '6'
True
>>> int('100') < int('6')
False
Converting the first item to int in key function will give you what you want.
a = sorted(lines, key=lambda x: int(x[0]))
You are sorting your numbers literally because they are strings not integers. As a more numpythonic way you can use np.loadtext in order to load your data then sort your rows based on second axis:
import numpy as np
array = np.loadtxt('rough.dat')
array.sort(axis=1)
print array
[[ 1. 2. 4. 9.]
[ 2. 3. 5. 11.]
[ 4. 5. 6. 7.]
[ 1. 2. 6. 100.]]
How do i add all imputed numbers in a string?
Ex:
input:
5 5 3 5
output
18
and it must supports ('-')
Ex.
input
-5 5 3 5
output
8
I write something like this:
x = raw_input()
print sum(map(int,str(x)))
and it adds normally if x>0
But what to do with ('-') ?
I understand that i need to use split() but my knowledge is not enough (
You're close, you just need to split the string on spaces. Splitting will produce the list of strings ['-5', '5', '3', '5']. Then you can do the rest of the map and sum as you intended.
>>> s = '-5 5 3 5'
>>> sum(map(int, s.split()))
8
its simple
>>> input = raw_input('Enter your input: ')
Enter your input: 5 5 10 -10
>>> list_numbers = [int(item) for item in input.split(' ')]
>>> print list_numbers
[5, 5, 10, -10]
And after what you want :)
You can use the following line:
sum(map(int, raw_input().split()))