getting index of a multiple line string - python

Trying to get an integer in a multi-line string with its value same as its index. Here's my trial.
table='''012
345
678'''
print (table[4])
if i execute the above, i will get a output of 3 instead of a 4.
I am trying to get number i with print(table[i])
What is the simplest way of getting the number corresponding to table[i] without using list, because i have to further use while loops later to replace values of the table and using lists would be very troublesome. Thanks.

Your string contains whitespaces (carriage return and mabye linefeed) at position 4 (\n in linux, \n\r on 4+5 on windows) - you can clean your text by removing them:
table='''012
345
678'''
print (table[4]) #3 - because [3] == \n
print(table.replace("\n","")[4]) # 4
You can view all characters in your "table" like so:
print(repr(table))
# print the ordinal value of the character and the character if a letter
for c in table:
print(ord(c), c if ord(c)>31 else "")
Output:
'012\n345\n678'
48 0
49 1
50 2
10
51 3
52 4
53 5
10
54 6
55 7
56 8
On a sidenote - you might want to build a lookup dict if your table does not change to skip replacing stuffin your string all the time:
table='''012
345
678'''
indexes = dict( enumerate(table.replace("\n","")))
print(indexes)
Output:
{0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7: '7', 8: '8'}
so you can do index[3] to get the '3' string

Related

Splitting Pandas dataframe into multiple mini-dataframes

This is the second part of a program im working on. I have a pandas dataframe that consists of:
Title|df1_data1|df1_data2|df1_data3|df1_data4|df2_data1|df2_data2|df2_data3|df2_data4|df3_data1|df3_data2|df3_data3|df3_data4
But theres two rules:
The df will NOT always consist of 3 files (df1, df2, df3) there can be more or less.
There is ALWAYS 4 pieces of data per file.
I have the next step of the code written but the input needs multiple mini-dataframes of this bigger one.
So for this example of Three files I need to split the dataframe into
1. |Title|df1_data1|df1_data2|df1_data3|df1_data4|
2. |Title|df2_data1|df2_data2|df2_data3|df2_data4|
3. |Title|df3_data1|df3_data2|df3_data3|df3_data4|
I'm currently trying to figure this out and i'm trying to loop through the headers and every four headers (not counting title) I create a dataframe... idk ima keep trying PLS HELP
Here's the big dataframe REMEMBER THE RULES
thisdict = {'Title': ['aaarrr','hahahamhm','yaaahooo','yaahoo', 'oopsymhm', 'ayorrr'],
'df1_data1': ['324','123','444','NOTHING', 'NOTHING', 'NOTHING'],
'df1_data2': ['4314','4321','7658','NOTHING', 'NOTHING', 'NOTHING'],
'df1_data3': ['342','111','235','NOTHING', 'NOTHING', 'NOTHING'],
'df1_data4': ['325','542','523','NOTHING', 'NOTHING', 'NOTHING'],
'df2_data1': ['1','NOTHING','NOTHING','4', '3', 'NOTHING'],
'df2_data2': ['2','NOTHING','NOTHING','3', '2', 'NOTHING'],
'df2_data3': ['3','NOTHING','NOTHING','2', '4', 'NOTHING'],
'df2_data4': ['4','NOTHING','NOTHING','1', '1', 'NOTHING'],
'df3_data1': ['NOTHING','NOTHING','NOTHING','2', '67', '4'],
'df3_data2': ['NOTHING','NOTHING','NOTHING','73', '2', '7'],
'df3_data3': ['NOTHING','NOTHING','NOTHING','2', '4', '5'],
'df3_data4': ['NOTHING', 'NOTHING', 'NOTHING', '1', '0', '9']
}
dataframe = pd.DataFrame(thisdict)
You can append Title as index (for just in case duplicate values of Title). Then, create dict of split segments of columns according to length of total columns.
df2 = dataframe.set_index('Title', append=True) # append for just in case duplicate values of Title
df_s = {(i+1): df2.iloc[:, i*4: i*4+4].reset_index(level=-1) for i in range(len(df2.columns) // 4)}
Then, you can access individual split dataframes by syntax of df_s[i], e.g.
print(df_s[1])
Title df1_data1 df1_data2 df1_data3 df1_data4
0 aaarrr 324 4314 342 325
1 hahahamhm 123 4321 111 542
2 yaaahooo 444 7658 235 523
3 yaahoo NOTHING NOTHING NOTHING NOTHING
4 oopsymhm NOTHING NOTHING NOTHING NOTHING
5 ayorrr NOTHING NOTHING NOTHING NOTHING
print(df_s[2])
Title df2_data1 df2_data2 df2_data3 df2_data4
0 aaarrr 1 2 3 4
1 hahahamhm NOTHING NOTHING NOTHING NOTHING
2 yaaahooo NOTHING NOTHING NOTHING NOTHING
3 yaahoo 4 3 2 1
4 oopsymhm 3 2 4 1
5 ayorrr NOTHING NOTHING NOTHING NOTHING
print(df_s[3])
Title df3_data1 df3_data2 df3_data3 df3_data4
0 aaarrr NOTHING NOTHING NOTHING NOTHING
1 hahahamhm NOTHING NOTHING NOTHING NOTHING
2 yaaahooo NOTHING NOTHING NOTHING NOTHING
3 yaahoo 2 73 2 1
4 oopsymhm 67 2 4 0
5 ayorrr 4 7 5 9
You can set Title as index and use filter to get the columns:
df = df.set_index('Title')
dfs = {'df%s' % i: df.filter(like='df%s' % i).reset_index()
for i in range (1, 3+1)}

How to remove strings between between parentheses (or any char) in DataFrame?

I have a string of number chars that I want to change to type int, but I need to remove the parentheses and the numbers in it (it's just a multiplier for my application, this is how I get the data).
Here is the sample code.
import pandas as pd
voltages = ['0', '0', '0', '0', '0', '310.000 (31)', '300.000 (30)', '190.000 (19)', '0', '20.000 (2)']
df = pd.DataFrame(voltages, columns=['Voltage'])
df
Out [1]:
Voltage
0 0
1 0
2 0
3 0
4 0
5 310.000 (31)
6 300.000 (30)
7 190.000 (19)
8 0
9 20.000 (2)
How can I remove the substrings within the parenthesis? Is there a Pandas.series.str way to do it?
Use str.replace with regex:
df.Voltage.str.replace(r"\s\(.*","")
Out:
0 0
1 0
2 0
3 0
4 0
5 310.000
6 300.000
7 190.000
8 0
9 20.000
Name: Voltage, dtype: object
You can also use str.split()
df_2 = df['Voltage'].str.split(' ', 0, expand = True).rename(columns = {0:'Voltage'})
df_2['Voltage'] = df_2['Voltage'].astype('float')
If you know the separating character will always be a space then the following is quite a neat way of doing it:
voltages = [i.rsplit(' ')[0] for i in voltages]
I think you could try this:
new_series = df['Voltage'].apply(lambda x:int(x.split('.')[0]))
df['Voltage'] = new_series
I hope it helps.
Hopefully, this will work for you:
result = source_value[:source_value.find(" (")]
NOTE: the find function requires a string as source_value. But if you have parens in your value, I assume it is a string.

Why is Python's max() function not accurate? [duplicate]

This question already has answers here:
Find highest and lowest number from the string of numbers
(5 answers)
Sorting numbers in python
(3 answers)
Closed 4 years ago.
I tried to use the max() function but I can't get the right max with it.
Example:
numbers = "4 5 29 54 4 0 -214 542 -64 1 -3 6 -6"
a = max(numbers.split(" "))
b = min(numbers.split(" "))
print a
print b
Output:
6
-214
It's obviously wrong, the max should be 542. Does anyone know why max() fails to find the correct max value? How to get the correct answer?
numbers.split(" ") gives you a list of strings, not integers.
If you want max() and min() to find the highest and lowest integers, then you need to convert your list of strings to a list of integers using map(int, your_array).
Example
numbers = "4 5 29 54 4 0 -214 542 -64 1 -3 6 -6"
numbers = numbers.split(" ") # Splits your string into a list of strings
numbers = map(int, numbers) # Converts each element in your list to int
a = max(numbers)
b = min(numbers)
print a # Outputs 542
print b # Outputs -214
In the other hand, you don't need to use map or other function to convert your string list to integer list, because it iterates over the list one more time, max function accepts key parameter, you can put a callable there, like this:
a = max(numbers.split(), key=int)
b = min(numbers.split(), key=int)
also in this case split() is same with split(" ").
Python max() function is accurate.
You should have a look at numbers.split(" ").
It returns a list of strings. Hence, the max compares and gives the max of the strings in the list.
>>> numbers.split(" ")
>>> ['4', '5', '29', '54', '4', '0', '-214', '542', '-64', '1', '-3', '6', '-6']
And, as string comparisons go, it will compare the first letter of each string, and the max would be: 6.
Because a and b are of type string, not an int.
numbers="4 5 29 54 4 0 -214 542 -64 1 -3 6 -6"
a = max(map(int, numbers.split(" ")))
b = min(map(int, numbers.split(" ")))
print a
print b
# 542
# -214
Try this:
numbers="4 5 29 54 4 0 -214 542 -64 1 -3 6 -6"
a = max(list(map(int, numbers.split(" "))))
b = min(list(map(int, numbers.split(" "))))
print a
print b

Match to re-code letters and numbers in python (pandas)

I have a variable that is mixed with letters and numbers. The letters range from A:Z and the numbers range from 2:8. I want to re-code this variable so that it is all numeric with the letters A:Z now becoming numbers 1:26 and the numbers 2:8 becoming numbers 27:33.
For example, I would like this variable:
Var1 = c('A',2,3,8,'C','W',6,'T')
To become this:
Var1 = c(1,27,28,33,3,23,31,20)
In R I can do this using 'match' like this:
Var1 = as.numeric(match(Var1, c(LETTERS, 2:8)))
How can I do this using python? Pandas?
Thank you
Make a dictionary and map the values:
import string
import numpy as np
dct = dict(zip(list(string.ascii_uppercase) + list(np.arange(2, 9)), np.arange(1, 34)))
# If they are strings of numbers, not integers use:
#dct = dict(zip(list(string.ascii_uppercase) + ['2', '3', '4', '5', '6', '7', '8'], np.arange(1, 34)))
df.col_name = df.col_name.map(dct)
An example:
import pandas as pd
df = pd.DataFrame({'col': [2, 4, 6, 3, 5, 'A', 'B', 'D', 'F', 'Z', 'X']})
df.col.map(dct)
Outputs:
0 27
1 29
2 31
3 28
4 30
5 1
6 2
7 4
8 6
9 26
10 24
Name: col, dtype: int64
i think that could help you
Replacing letters with numbers with its position in alphabet
then you just need to apply on you df column
dt.Var1.apply(alphabet_position)
you can also try this
for i in range(len(var1)):
if type(var1[i]) == int:
var1[i] = var1[i] + 25
else:
var1[i] = ord(var1[i].lower()) - 96

Comparing two numbers lists with each other in Python

I have a data frame (possibly a list):
A = ['01', '20', '02', '25', '26']
B = ['10', '13', '14', '64', '32']
I would like to compare list 'a' with list 'b' in the following way:
As you can see, strings of numbers in the left column with strings in the right column are compared. Combined are strings that have the same boundary digit, one of which is removed during merging (or after). Why was the string '010' removed? Because each digit can occur only once.
You can perform a couple of string slicing operations and then merge on the common digit.
a
A
0 01
1 20
2 02
3 25
4 26
b
B
0 10
1 13
2 14
3 64
4 32
a['x'] = a.A.str[-1]
b['x'] = b.B.str[0]
b['B'] = b.B.str[1:]
m = a.merge(b)
You could also do this in a single line with assign, without disrupting the original dataframes:
m = a.assign(x=a.A.str[-1]).merge(b.assign(x=b.B.str[0], B=b.B.str[1:]))
For uniques, you'll need to convert to set and check its length.
v = (m['A'] + m['B'])
v.str.len() == v.apply(set).str.len()
0 False
1 True
2 True
3 True
dtype: bool
v[v.str.len() == v.apply(set).str.len()].tolist()
['013', '014', '264']
Something you should be aware of is that you're actually passing integers, not strings. That means that A = [01, 20, 02, 25, 26] is the same as A = [1, 20, 2, 25, 26]. If you always know that you're going to be working with integers <= 99, however, this won't be an issue. Otherwise, you should use strings instead of integers, like A = ['01', '20', '02', '25', '26']. So the first thing you should do is convert the lists to lists of strings. If you know all of the integers will be <= 99, you can do so like this:
A = ['%02d' % i for i in A]
B = ['%02d' % i for i in B]
(you could also name these something different if you want to preserve the integer lists). Then here would be the solution:
final = []
for i in A:
for j in B:
if i[-1] == j[0]:
final.append(i + j[1:])

Categories

Resources