MoviePy multiple textClips one after another - python

I have an audio file and its script file, that looks like this
One day I was playing Fortnite
solos until I got a strange fend
invites I never seen before I joned
it an he said time to play
a game I sruged it of like
no big deal I regreted doing that
but he kept saying time to play
over and over and over ugen
My goal is to make a video where voice is followed by text appearing on the screen (next line appears, previous disappears). The way I do it is obviously wrong and it just renders all lines at the beginning of the video stacked on each other + I can not know how much lines of script there will be so doing texts[0], texts[1]... is not an option. Please send help!
My code:
videoclip = VideoFileClip("Satisfying Minecraft Parkour.mp4")
audioclip = AudioFileClip(f"audio.mp3")
new_audioclip = CompositeAudioClip([audioclip])
videoclip.audio = new_audioclip
texts = []
with open(f'text.txt', 'r') as f:
for line in f:
txt_clip = TextClip(line, fontsize = 55, color = 'white')
txt_clip = txt_clip.set_pos('center')
txt_clip = txt_clip.set_duration(audio_in_seconds/len(str(text))*len(line))
texts.append(txt_clip)
video = CompositeVideoClip([videoclip, texts[0]])
video = CompositeVideoClip([video, texts[1]])

Related

How to display text on the screen it is said over the audio

As a personal project, I decided to create one of the reddit text-to-speech bot.
I pulled all the data from reddit with praw
import praw, random
def scrapeData(subredditName):
# Instantiate praw
reddit = praw.Reddit()
# Get subreddit
subreddit = reddit.subreddit(subredditName)
# Get a bunch of posts and convert them into a list
posts = list(subreddit.new(limit=100))
# Get random number
randomNumber = random.randint(0, 100)
# Store post's title and description in variables
postTitle = posts[randomNumber].title
postDesc = posts[randomNumber].selftext
return postTitle + " " + postDesc
Then, I converted it to speech stored in a .mp3 file with gTTS.
from google.cloud import texttospeech
def convertTextToSpeech(textString):
# Instantiate TTS
client = texttospeech.TextToSpeechClient().from_service_account_json("path/to/json")
# Set text input to be synthesized
synthesisInput = texttospeech.SynthesisInput(text=textString)
# Build the voice request
voice = texttospeech.VoiceSelectionParams(language_code = "en-us",
ssml_gender = texttospeech.SsmlVoiceGender.MALE)
# Select the type of audio file
audioConfig = texttospeech.AudioConfig(audio_encoding =
texttospeech.AudioEncoding.MP3)
# Perform the TTS request on the text input
response = client.synthesize_speech(input = synthesisInput, voice =
voice, audio_config= audioConfig)
# Convert from binary to mp3
with open("output.mp3", "wb") as out:
out.write(response.audio_content)
I've created an .mp4 with moviepy that has generic footage in the background with the audio synced over it,
from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip
# get vide and audio source files
clip = VideoFileClip("background.mp4").subclip(20,30)
audio = AudioFileClip("output.mp3").subclip(0, 10)
# Set audio and create final video
videoClip = clip.set_audio(audio)
videoClip.write_videofile("output.mp4")
but my issue is I can't find a way to have only the current word or sentence displayed on screen as a subtitle, rather than the entire post.

How to change volume of stem files while playing using python

I'm attempting to write a python project that plays multiple parts of a song at the same time.
For background information, a song is split into "stems", and then each stem is played simultaneously to recreate the full song. What I am trying to achieve is using potentiometers to control the volume of each stem, so that the user can mix songs differently. For a product relation, the StemPlayer from Kanye West is what I am trying to achieve.
I can change the volume of the overlayed song at the end, but what I want to do is change the volume of each stem using a potentiometer while the song is playing. Is this even possible using pyDub? Below is the code I have right now.
from pydub import AudioSegment
from pydub.playback import play
vocals = AudioSegment.from_file("walkin_vocals.mp3")
drums = AudioSegment.from_file("walkin_drums.mp3")
bass = AudioSegment.from_file("walkin_bass.mp3")
vocalsDrums = vocals.overlay(drums)
bassVocalsDrums = vocalsDrums.overlay(bass)
songQuiet = bassVocalsDrums - 20
play(songQuiet)
Solved this question, I ended up using pyaudio instead of pydub.
With pyaudio, I was able to define a custom stream_callback function. Within this callback function, I multiply each stem by a modifier, then add each stem to one audio output.
def callback(in_data, frame_count, time_info, status):
global drumsMod, vocalsMod, bassMod, otherMod
drums = drumsWF.readframes(frame_count)
vocals = vocalsWF.readframes(frame_count)
bass = bassWF.readframes(frame_count)
other = otherWF.readframes(frame_count)
decodedDrums = numpy.frombuffer(drums, numpy.int16)
decodedVocals = numpy.frombuffer(vocals, numpy.int16)
decodedBass = numpy.frombuffer(bass, numpy.int16)
decodedOther = numpy.frombuffer(other, numpy.int16)
newdata = (decodedDrums*drumsMod + decodedVocals*vocalsMod + decodedBass*bassMod + decodedOther*otherMod).astype(numpy.int16)
return (newdata.tobytes(), pyaudio.paContinue)

Print image and record audio input

I am very green about programming but wish to learn and develop.
I want to write a simple application that will be useful in linguistic treatments - but at first it is simple demo.
The application is about to display image and record sound during projection.
There are few variables - interval and image/sound/movie clip paths - taken from external txt file (for the beginning - later I would like to perform some creator with presaved configurations).
The config file now looks like:
10
path1
path2
...
The first line is about to input interval in seconds, next there are paths to images, sounds or movie clips (I tried with images for now).
#!/usr/bin/python
# main.py
import sys
from PyQt4 import QtGui, QtCore
from Tkinter import *
import numpy as np
import pyaudio
import wave
import time
from PIL import Image, ImageTk
import multiprocessing
import threading
from threading import Thread
master = Tk()
conf_file = open("conf.txt", "r") #open conf file read only
conf_lines = conf_file.readlines()
conf_file.close()
interwal = conf_lines[0] #interval value from conf.txt file
bodziec1 = conf_lines[1] #paths to stimulus file (img / audio / video)
bodziec2 = conf_lines[2]
bodziec3 = conf_lines[3]
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = interwal #every stimulus has it's own audio record file for further work
timestr = time.strftime("%Y%m%d-%H%M%S") #filename is set to year / month / day - hour / minute / second for easier systematization
def nagrywanie(): #recording action - found somewhere in the all-knowing web
p = pyaudio.PyAudo()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* nagrywanie") #info about record to start
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* koniec nagrywania") #info about record to end
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(timestr, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
def bod1(): #stimulus 1st to display / play
image = Image.open(bodziec1)
photo = ImageTk.PhotoImage(image)
def bod2():
image = Image.open(bodziec2) #stimulus 2nd to display / play
photo = ImageTk.PhotoImage(image)
def bod3():
image = Image.open(bodziec3) #stimulus 3rd to display / play
photo = ImageTk.PhotoImage(image)
def odpal(): #attemption to run display and recording at the same time
Thread(target = bod1).start()
Thread(target = nagrywanie).start()
# Wait interwal for odpal #give impetus for time in first line of the conf.txt
time.sleep(interwal)
# Terminate odpal #stop giving impetus
bod1.terminate()
# Cleanup #?? this part is also copied from all-knowing internet
p.join()
b = Button(master, text="OK", command=odpal) #wanted the program to be easier for non-programmers to operate so few buttons are necessary
b.pack()
mainloop()
When asked few programmers about the code it is as simple as riding a bike, so I wanted to learn how to write it by myself.
I guess it is peace of cake for professionals - 1000s of thanks to these ones who want even to read this junk.
It takes a lot of time for me to understand and figure out the exact commends that is why I am asking politely about the help - not only for education but also for better diagnosis.
Excuse me for the language - English is not my native language.

could not convert string to float 9DOF Razor IMU

I am receiving the "could not convert string to float" error in this code. Could anyone take a look to see what is wrong please? I am working with the 9DOF Razor IMU connected to the PC using an FTDI, both from sparkfun. I am trying to see the axis x,y and z and the yaw, pitch and roll bars moving while I rotate my Razor, but I am receiving this error. It's the first project I'm working on, so everything I am doing is based on tutorials and blogs. English is not my motherlanguage, so sorry 'bout any english mistakes i've made. Thank you.
# This script needs VPhyton, pyserial and pywin modules
from visual import *
import serial
import string
import math
from time import time
grad2rad = 3.141592/180.0
# Check your COM port and baud rate
ser = serial.Serial(port='COM13',baudrate=57600, timeout=1)
# Main scene
scene=display(title="9DOF Razor IMU test")
scene.range=(1.2,1.2,1.2)
#scene.forward = (0,-1,-0.25)
scene.forward = (1,0,-0.25)
scene.up=(0,0,1)
# Second scene (Roll, Pitch, Yaw)
scene2 = display(title='9DOF Razor IMU test',x=0, y=0, width=500, height=200,center=(0,0,0), background=(0,0,0))
scene2.range=(1,1,1)
scene.width=500
scene.y=200
scene2.select()
#Roll, Pitch, Yaw
cil_roll = cylinder(pos=(-0.4,0,0),axis=(0.2,0,0),radius=0.01,color=color.red)
cil_roll2 = cylinder(pos=(-0.4,0,0),axis=(-0.2,0,0),radius=0.01,color=color.red)
cil_pitch = cylinder(pos=(0.1,0,0),axis=(0.2,0,0),radius=0.01,color=color.green)
cil_pitch2 = cylinder(pos=(0.1,0,0),axis=(-0.2,0,0),radius=0.01,color=color.green)
#cil_course = cylinder(pos=(0.6,0,0),axis=(0.2,0,0),radius=0.01,color=color.blue)
#cil_course2 = cylinder(pos=(0.6,0,0),axis=(-0.2,0,0),radius=0.01,color=color.blue)
arrow_course = arrow(pos=(0.6,0,0),color=color.cyan,axis=(-0.2,0,0), shaftwidth=0.02, fixedwidth=1)
#Roll,Pitch,Yaw labels
label(pos=(-0.4,0.3,0),text="Roll",box=0,opacity=0)
label(pos=(0.1,0.3,0),text="Pitch",box=0,opacity=0)
label(pos=(0.55,0.3,0),text="Yaw",box=0,opacity=0)
label(pos=(0.6,0.22,0),text="N",box=0,opacity=0,color=color.yellow)
label(pos=(0.6,-0.22,0),text="S",box=0,opacity=0,color=color.yellow)
label(pos=(0.38,0,0),text="W",box=0,opacity=0,color=color.yellow)
label(pos=(0.82,0,0),text="E",box=0,opacity=0,color=color.yellow)
label(pos=(0.75,0.15,0),height=7,text="NE",box=0,color=color.yellow)
label(pos=(0.45,0.15,0),height=7,text="NW",box=0,color=color.yellow)
label(pos=(0.75,-0.15,0),height=7,text="SE",box=0,color=color.yellow)
label(pos=(0.45,-0.15,0),height=7,text="SW",box=0,color=color.yellow)
L1 = label(pos=(-0.4,0.22,0),text="-",box=0,opacity=0)
L2 = label(pos=(0.1,0.22,0),text="-",box=0,opacity=0)
L3 = label(pos=(0.7,0.3,0),text="-",box=0,opacity=0)
# Main scene objects
scene.select()
# Reference axis (x,y,z)
arrow(color=color.green,axis=(1,0,0), shaftwidth=0.02, fixedwidth=1)
arrow(color=color.green,axis=(0,-1,0), shaftwidth=0.02 , fixedwidth=1)
arrow(color=color.green,axis=(0,0,-1), shaftwidth=0.02, fixedwidth=1)
# labels
label(pos=(0,0,0.8),text="9DOF Razor IMU test",box=0,opacity=0)
label(pos=(1,0,0),text="X",box=0,opacity=0)
label(pos=(0,-1,0),text="Y",box=0,opacity=0)
label(pos=(0,0,-1),text="Z",box=0,opacity=0)
# IMU object
platform = box(length=1, height=0.05, width=1, color=color.red)
p_line = box(length=1,height=0.08,width=0.1,color=color.yellow)
plat_arrow = arrow(color=color.green,axis=(1,0,0), shaftwidth=0.06, fixedwidth=1)
f = open("Serial"+str(time())+".txt", 'w')
roll=0
pitch=0
yaw=0
while 1:
line = ser.readline()
line = line.replace("!ANG:","") # Delete "!ANG:"
print line
f.write(line) # Write to the output log file
words = string.split(line,",") # Fields split
if len(words) > 2:
try:
roll = float(words[0])*grad2rad
pitch = float(words[1])*grad2rad
yaw = float(words[2])*grad2rad
except:
print "Invalid line"
axis=(cos(pitch)*cos(yaw),-cos(pitch)*sin(yaw),sin(pitch))
up=(sin(roll)*sin(yaw)+cos(roll)*sin(pitch)*cos(yaw),sin(roll)*cos(yaw)- cos(roll)*sin(pitch)*sin(yaw),-cos(roll)*cos(pitch))
platform.axis=axis
platform.up=up
platform.length=1.0
platform.width=0.65
plat_arrow.axis=axis
plat_arrow.up=up
plat_arrow.length=0.8
p_line.axis=axis
p_line.up=up
cil_roll.axis=(0.2*cos(roll),0.2*sin(roll),0)
cil_roll2.axis=(-0.2*cos(roll),-0.2*sin(roll),0)
cil_pitch.axis=(0.2*cos(pitch),0.2*sin(pitch),0)
cil_pitch2.axis=(-0.2*cos(pitch),-0.2*sin(pitch),0)
arrow_course.axis=(0.2*sin(yaw),0.2*cos(yaw),0)
L1.text = str(float(words[0]))
L2.text = str(float(words[1]))
L3.text = str(float(words[2]))
ser.close
f.close
You must be using some old tutorials if the are suggesting stuff like string.split(line,",")
From your source, it seems the problem is here:
L1.text = str(float(words[0]))
L2.text = str(float(words[1]))
L3.text = str(float(words[2]))
You are trying to convert something to a float, that Python can't convert to a number. From the looks of it, these are text labels, so why not try:
L1.text = words[0]
L2.text = words[1]
L3.text = words[2]
Some more tips:
words = line.split(',') # instead of: words = string.split(line,",")
ser.close() # close is a function, so you should call it.
f.close()
make the following changes to the code:
replace the line
line = line.replace("!ANG:","")
with
line = line.replace("#YPR=","")
replace the split line as told earlier and that should just do the trick!

Use decodebin with adder

I'm trying to create an audio stream that has a constant audio source (in this case, audiotestsrc) to which I can occasionally add sounds from files (of various formats, that's why I'm using decodebin) through the play_file() method. I use an adder for that purpose. However, for some reason, I cannot add the second sound correctly. Not only does the program play the sound incorrectly, it also completely stops the original audiotestsrc. Here's my code so far:
import gst; import gobject; gobject.threads_init()
pipe = gst.Pipeline()
adder = gst.element_factory_make("adder", "adder")
first_sink = adder.get_request_pad('sink%d')
pipe.add(adder)
test = gst.element_factory_make("audiotestsrc", "test")
test.set_property('freq', 100)
pipe.add(test)
testsrc = test.get_pad("src")
testsrc.link(first_sink)
output = gst.element_factory_make("alsasink", "output")
pipe.add(output)
adder.link(output)
pipe.set_state(gst.STATE_PLAYING)
raw_input('Press key to play sound')
def play_file(filename):
adder_sink = adder.get_request_pad('sink%d')
audiofile = gst.element_factory_make('filesrc', 'audiofile')
audiofile.set_property('location', filename)
decoder = gst.element_factory_make('decodebin', 'decoder')
def on_new_decoded_pad(element, pad, last):
pad.link(adder_sink)
decoder.connect('new-decoded-pad', on_new_decoded_pad)
pipe.add(audiofile)
pipe.add(decoder)
audiofile.link(decoder)
pipe.set_state(gst.STATE_PAUSED)
pipe.set_state(gst.STATE_PLAYING)
play_file('sample.wav')
while True:
pass
Thanks to moch on #gstreamer, I realized that all adder sources should have the same format. I modified the above script so as to have the caps "audio/x-raw-int, endianness=(int)1234, channels=(int)1, width=(int)16, depth=(int)16, signed=(boolean)true, rate=(int)11025" (example) go before every input in the adder.

Categories

Resources