Program to output letter pyramid

Program to output letter pyramid - python

To print the output
A
A B
A B C
A B C D
A B C D E
I used the following code, but it does not work correctly.
strg = "A B C D E F"
i = 0
while i < len(strg):
print strg[0:i+1]
print "\n"
i = i + 1
For this code the obtained output is:
A
A
A B
A B
A B C
A B C
A B C D
A B C D
A B C D E
A B C D E
A B C D E F
Why does each line get printed twice?

Whitespace. You need to increment i by 2 instead of 1. Try:
strg = "A B C D E F"
i = 0
while i < len(strg):
print strg[0:i+2]
print "\n"
i = i+2
This will allow you to skip over the whitespace as "indices" of the string
A little more pythonic:
>>> strg = "ABCDEF"
>>> for index,_ in enumerate(strg):
print " ".join(strg[:index+1])
A
A B
A B C
A B C D
A B C D E
A B C D E F

Related

Why does pandas read_excel shift columns with each iteration?

I'm trying read a bunch of excel files (700+) and compile them into a single database using a for loop. However, each iteration of the for loop shifts the first four columns to the end of the data set in a bizarre repeating pattern. I'm not practiced in python, and I can't figure out what's causing this.
excel_files = glob.glob("/State Report_2020****308.xls")
list1 = [pd.read_excel(filename, sheet_name="Raw Data", usecols = "A:S", skiprows = 9, nrows=33-10) for filename in excel_files]
raw_data = pd.concat(list1, axis=0, ignore_index=True)
For example the data extracted from the first sheet looks like:
A B C ... Q R S
a b c ... q r s
a b c ... q r s
a b c ... q r s
a b c ... q r s
Then Data from the second sheet is extracted, appended to the bottom of the data frame, and looks like:
A B C ... Q R S T U V W
e f g ... q r S a b c d
e f g ... q r S a b c d
e f g ... q r S a b c d
e f g ... q r S a b c d
Then data from the third sheet is iterated into the data frame, and looks like:
A B C ... Q R S T U V W X Y Z AA
e f g ... q r S a b c d
e f g ... q r S a b c d
e f g ... q r S a b c d
e f g ... q r S a b c d
This pattern repeats with every iteration shifting the first columns of data further to the right.

The error was from "skiprows = 9" as one of the pd.read_excel inputs. Row 9 was the column headers for the table. I thought if I left the headers in there, they would be added as a row with each subsequent iteration like:
A B C ... Q R S
a b c ... q r s
a b c ... q r s
...............
A B C ... Q R S
a b c ... q r s
a b c ... q r s
...............
A B C ... Q R S
a b c ... q r s
a b c ... q r s
...............
Instead I left the headers in "skiprows = 8" and got the result I originally wanted.
A B C ... Q R S
a b c ... q r s
a b c ... q r s
...............
a b c ... q r s
a b c ... q r s
...............
a b c ... q r s
a b c ... q r s
...............

Pipeline with count and tfidf vectorizer produces TypeError: expected string or bytes-like object

I have a corpus like the following
'C C C 0 0 0 X 0 1 0 0 0 0', 'C C C 0 0 0 X 0 1 0 0 0 0', 'C C C 0 0 0 X 0 1 0 0 0 0', 'X X X', 'X X X', 'X X X',
I would like to use count and tfidf vectorizer along with logistic regression as a classifier.
The code below I have adapted from sklearn's samples.
from pprint import pprint
from time import time
import logging
import pickle
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
print(__doc__)
# Display progress logs on stdout
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')
# #############################################################################
# Define a pipeline combining a text feature extractor with a simple
# classifier
pipeline = Pipeline([
('vect', CountVectorizer(analyzer='char',lowercase=False)),
('tfidf', TfidfVectorizer(analyzer='char',lowercase=False)),
('clf', LogisticRegression()),
])
# uncommenting more parameters will give better exploring power but will
# increase processing time in a combinatorial way
parameters = {
'vect__max_df': (0.5, 0.75, 1.0),
# 'vect__max_features': (None, 5000, 10000, 50000),
'vect__ngram_range': ((1, 1), (1, 2)), # unigrams or bigrams
# 'tfidf__use_idf': (True, False),
# 'tfidf__norm': ('l1', 'l2'),
'clf__max_iter': (1000,),
'clf__C': (0.00001, 0.000001),
'clf__penalty': ('l2', 'elasticnet'),
# 'clf__max_iter': (10, 50, 80),
}
if __name__ == "__main__":
# multiprocessing requires the fork to happen in a __main__ protected
# block
# find the best parameters for both the feature extraction and the
# classifier
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1)
corpus =['C C C 0 0 0 X 0 1 0 0 0 0', 'C C C 0 0 0 X 0 1 0 0 0 0', 'C C C 0 0 0 X 0 1 0 0 0 0', 'X X X', 'X X X',
'X X X', 'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X',
'C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C X X X X 0 0 0 X 0 X X']
y_train = [0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
print(len(corpus),len(y_train))
print("Performing grid search...")
print("pipeline:", [name for name, _ in pipeline.steps])
print("parameters:")
pprint(parameters)
t0 = time()
#print(type(data.data),type(data.target))
#print(data.data[:1])
#print(data.data[:2])
grid_search.fit(corpus,y_train)
print("done in %0.3fs" % (time() - t0))
print()
print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters set:")
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
print("\t%s: %r" % (param_name, best_parameters[param_name]))
My stack trace is as follows
Automatically created module for IPython interactive environment
50 50
Performing grid search...
pipeline: ['vect', 'tfidf', 'clf']
parameters:
{'clf__C': (1e-05, 1e-06),
'clf__max_iter': (1000,),
'clf__penalty': ('l2', 'elasticnet'),
'vect__max_df': (0.5, 0.75, 1.0),
'vect__ngram_range': ((1, 1), (1, 2))}
Fitting 5 folds for each of 24 candidates, totalling 120 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed: 0.1s finished
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-114-0d47590b1279> in <module>
107 #print(data.data[:2])
108
--> 109 grid_search.fit(corpus,y_train)
110 print("done in %0.3fs" % (time() - t0))
111 print()
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
737 refit_start_time = time.time()
738 if y is not None:
--> 739 self.best_estimator_.fit(X, y, **fit_params)
740 else:
741 self.best_estimator_.fit(X, **fit_params)
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
348 This estimator
349 """
--> 350 Xt, fit_params = self._fit(X, y, **fit_params)
351 with _print_elapsed_time('Pipeline',
352 self._log_message(len(self.steps) - 1)):
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\pipeline.py in _fit(self, X, y, **fit_params)
313 message_clsname='Pipeline',
314 message=self._log_message(step_idx),
--> 315 **fit_params_steps[name])
316 # Replace the transformer of the step with the fitted
317 # transformer. This is necessary when loading the transformer
E:\anaconda\envs\appliedaicourse\lib\site-packages\joblib\memory.py in __call__(self, *args, **kwargs)
350
351 def __call__(self, *args, **kwargs):
--> 352 return self.func(*args, **kwargs)
353
354 def call_and_shelve(self, *args, **kwargs):
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
726 with _print_elapsed_time(message_clsname, message):
727 if hasattr(transformer, 'fit_transform'):
--> 728 res = transformer.fit_transform(X, y, **fit_params)
729 else:
730 res = transformer.fit(X, y, **fit_params).transform(X)
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
1857 """
1858 self._check_params()
-> 1859 X = super().fit_transform(raw_documents)
1860 self._tfidf.fit(X)
1861 # X is already a transformed view of raw_documents so
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
1218
1219 vocabulary, X = self._count_vocab(raw_documents,
-> 1220 self.fixed_vocabulary_)
1221
1222 if self.binary:
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
1129 for doc in raw_documents:
1130 feature_counter = {}
-> 1131 for feature in analyze(doc):
1132 try:
1133 feature_idx = vocabulary[feature]
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\feature_extraction\text.py in _analyze(doc, analyzer, tokenizer, ngrams, preprocessor, decoder, stop_words)
108 doc = ngrams(doc, stop_words)
109 else:
--> 110 doc = ngrams(doc)
111 return doc
112
E:\anaconda\envs\appliedaicourse\lib\site-packages\sklearn\feature_extraction\text.py in _char_ngrams(self, text_document)
255 """Tokenize text_document into a sequence of character n-grams"""
256 # normalize white spaces
--> 257 text_document = self._white_spaces.sub(" ", text_document)
258
259 text_len = len(text_document)
TypeError: expected string or bytes-like object
I ran the tfidf vectorizer alone and get the following results
vectorizer = TfidfVectorizer(analyzer='char',lowercase=False,ngram_range=(6, 6))
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
print(X.shape)
print(X)
Results
<class 'list'>
[' 0 0 0', ' 0 0 X', ' 0 1 0', ' 0 X 0', ' 0 X X', ' 1 0 0', ' C 0 0', ' C C 0', ' C C C', ' C C X', ' C X X', ' X 0 0', ' X 0 1', ' X 0 X', ' X X 0', ' X X X', '0 0 0 ', '0 0 X ', '0 1 0 ', '0 X 0 ', '1 0 0 ', 'C 0 0 ', 'C C 0 ', 'C C C ', 'C C X ', 'C X X ', 'X 0 0 ', 'X 0 1 ', 'X 0 X ', 'X X 0 ', 'X X X ']
(50, 31)
(0, 20) 0.31810783213188626
(0, 5) 0.31810783213188626
(0, 18) 0.31810783213188626
(0, 2) 0.31810783213188626
(0, 27) 0.31810783213188626
(0, 12) 0.31810783213188626
(0, 19) 0.16116825632411622
(0, 3) 0.16116825632411622
(0, 17) 0.16116825632411622
(0, 1) 0.11378963445554637
(0, 16) 0.22757926891109273
(0, 0) 0.3413689033666391
(0, 21) 0.17370780684495662
(0, 6) 0.17370780684495662
(0, 22) 0.17370780684495662
(0, 7) 0.17370780684495662
(0, 23) 0.11378963445554637
(1, 20) 0.31810783213188626
(1, 5) 0.31810783213188626
(1, 18) 0.31810783213188626
...
...
...
(49, 1) 0.01436413072356797
(49, 16) 0.01436413072356797
(49, 0) 0.01436413072356797
(49, 23) 0.6894782747312626
My Question
Why is the standalone vectorizer working but when placed within pipeline that is used by Gridsearch I get the Type Error

By default, both CountVectorizer and TfidfVectorizer expect a sequence of items that can be of type string or byte. In your pipeline the CountVectorizer receives the corpus and outputs to TfidfVectorizer a sparse representation of the counts using scipy.sparse.csr_matrix. Since the input to TfidfVectorizer is not of the expected type you get the type error "TypeError: expected string or bytes-like object". Your pipeline works if you use either but not both vectorizers. For example,
pipeline = Pipeline([
#('vect', CountVectorizer(analyzer='char',lowercase=False)),
('tfidf', TfidfVectorizer(analyzer='char',lowercase=False)),
('clf', LogisticRegression())
])
# uncommenting more parameters will give better exploring power but will
# increase processing time in a combinatorial way
parameters = {
#'vect__max_df': (0.5, 0.75, 1.0),
# 'vect__max_features': (None, 5000, 10000, 50000),
#'vect__ngram_range': [(1, 1), (1, 2)], # unigrams or bigrams
'tfidf__use_idf': [True, False],
'tfidf__norm': ['l1', 'l2'],
'clf__max_iter': [1000],
'clf__C': [0.00001, 0.000001],
'clf__penalty': ['l2'],
# 'clf__max_iter': (10, 50, 80),
}
produces the following output:
50 50
Performing grid search...
pipeline: ['tfidf', 'clf']
parameters:
{'clf__C': [1e-05, 1e-06],
'clf__max_iter': [1000],
'clf__penalty': ['l2'],
'tfidf__norm': ['l1', 'l2'],
'tfidf__use_idf': [True, False]}
Fitting 5 folds for each of 8 candidates, totalling 40 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
done in 0.347s
Best score: 0.680
Best parameters set:
clf__C: 1e-05
clf__max_iter: 1000
clf__penalty: 'l2'
tfidf__norm: 'l1'
tfidf__use_idf: True
[Parallel(n_jobs=-1)]: Done 40 out of 40 | elapsed: 0.2s finished

Find row number for specific change in python numpy

I have a file like this:
C
C
C
C
C
C
C
C
B
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
B
B
B
B
B
I like to print the row number where there is a change from C to B. Like here in row 9 and 27. I need to row number only. How to do that in numpy python.
Thank you

This is one possible solution, but it's not using Numpy
f = open('yourfile.txt', 'r')
x = 0
for i in f:
x = x + 1
if i == 'B':
print('row' + str(x))

If working in pure numpy (this is C-level), I advise you to use np.flatnonzero for comparison of neighbour items:
import numpy as np
x = np.loadtxt(r'your_file.txt', dtype='O')
marker_idx = np.flatnonzero((x[1:]=='B') & (x[:-1]=='C')) + 1 #default way
print(marker_idx + 1) #since you are counting from 1 for some reason
Output
[ 9 27]

How to leave only one defined sub-string in a string in Python

Say I have one of the strings:
"a b c d e f f g" || "a b c f d e f g"
And I want there to be only one occurrence of a substring (f in this instance) throughout the string so that it is somewhat sanitized.
The result of each string would be:
"a b c d e f g" || "a b c d e f g"
An example of the use would be:
str = "a b c d e f g g g g g h i j k l"
str.leaveOne("g")
#// a b c d e f g h i j k l

If it doesn't matter which instance you leave, you can use str.replace, which takes a parameter signifying the number of replacements you want to perform:
def leave_one_last(source, to_remove):
return source.replace(to_remove, '', source.count(to_remove) - 1)
This will leave the last occurrence.
We can modify it to leave the first occurrence by reversing the string twice:
def leave_one_first(source, to_remove):
return source[::-1].replace(to_remove, '', source.count(to_remove) - 1)[::-1]
However, that is ugly, not to mention inefficient. A more elegant way might be to take the substring that ends with the first occurrence of the character to find, replace occurrences of it in the rest, and finally concatenate them together:
def leave_one_first_v2(source, to_remove):
first_index = source.index(to_remove) + 1
return source[:first_index] + source[first_index:].replace(to_remove, '')
If we try this:
string = "a b c d e f g g g g g h i j k l g"
print(leave_one_last(string, 'g'))
print(leave_one_first(string, 'g'))
print(leave_one_first_v2(string, 'g'))
Output:
a b c d e f h i j k l g
a b c d e f g h i j k l
a b c d e f g h i j k l
If you don't want to keep spaces, then you should use a version based on split:
def leave_one_split(source, to_remove):
chars = source.split()
first_index = chars.index(to_remove) + 1
return ' '.join(chars[:first_index] + [char for char in chars[first_index:] if char != to_remove])
string = "a b c d e f g g g g g h i j k l g"
print(leave_one_split(string, 'g'))
Output:
'a b c d e f g h i j k l'

If I understand correctly, you can just use a regex and re.sub to look for groups of two or more of your letter with or without a space and replace it by a single instance:
import re
def leaveOne(s, char):
return re.sub(r'((%s\s?)){2,}' % char, r'\1' , s)
leaveOne("a b c d e f g g g h i j k l", 'g')
# 'a b c d e f g h i j k l'
leaveOne("a b c d e f ggg h i j k l", 'g')
# 'a b c d e f g h i j k l'
leaveOne("a b c d e f g h i j k l", 'g')
# 'a b c d e f g h i j k l'
EDIT
If the goal is to get rid of all occurrences of the letter except one, you can still use a regex with a lookahead to select all letters followed by the same:
import re
def leaveOne(s, char):
return re.sub(r'(%s)\s?(?=.*?\1)' % char, '' , s)
print(leaveOne("a b c d e f g g g h i j k l g", 'g'))
# 'a b c d e f h i j k l g'
print(leaveOne("a b c d e f ggg h i j k l gg g", 'g'))
# 'a b c d e f h i j k l g'
print(leaveOne("a b c d e f g h i j k l", 'g'))
# 'a b c d e f g h i j k l'
This should even work with more complicated patterns like:
leaveOne("a b c ffff d e ff g", 'ff')
# 'a b c d e ff g'

Given String
mystr = 'defghhabbbczasdvakfafj'
cache = {}
seq = 0
for i in mystr:
if i not in cache:
cache[i] = seq
print (cache[i])
seq+=1
mylist = []
Here I have ordered the dictionary with values
for key,value in sorted(cache.items(),key=lambda x : x[1]):
mylist.append(key)
print ("".join(mylist))

Alternative solution for printing pattern using python

I want to print pattern using python and i have done it but i want to
know other solutions possible for the same:-
A B C D E F G F E D C B A
A B C D E F F E D C B A
A B C D E E D C B A
......
....
A A
and here is my code:-
n=0
for i in range(71,64,-1):
for j in range(65,i+1):
a=chr(j)
print(a, end=" ")
if n>0:
for l in range(1,3+(n-1)*4):
print(end=" ")
if i<71:
j=j+1
for k in range(j-1,64,-1):
b=chr(k)
print(b, end=" ")
n=n+1
print()

Here's an alternative method using 3rd party library numpy. I use this library specifically because it allows vectorised assignment, which I use instead of an inner loop.
from string import ascii_uppercase
import numpy as np
n = 7
# extract first n letters from alphabet
letters = ascii_uppercase[:n]
res = np.array([list(letters + letters[-2::-1])] * (n-1))
# generate indices that are removed per line
idx = (range(n-i-1, n+i) for i in range(n-1))
# printing logic
print(' '.join(res[0]))
for i, j in enumerate(idx):
# vectorised assignment
res[i, j] = ' '
print(' '.join(res[i]))
Result:
A B C D E F G F E D C B A
A B C D E F F E D C B A
A B C D E E D C B A
A B C D D C B A
A B C C B A
A B B A
A A

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Program to output letter pyramid - python

Related

Why does pandas read_excel shift columns with each iteration?

Pipeline with count and tfidf vectorizer produces TypeError: expected string or bytes-like object

Find row number for specific change in python numpy

How to leave only one defined sub-string in a string in Python

Alternative solution for printing pattern using python

Categories

Resources