I have two wav files that I want to mix together to form one wav file. They are both the same samples format etc…

Been searching google endlessly.

I would prefer to do it using the wave module in python.

How can this be done?

You can use the pydub library (a light wrapper I wrote around the python wave module in the std lib) to do it pretty simply:

from pydub import AudioSegment

sound1 = AudioSegment.from_file("/path/to/my_sound.wav")
sound2 = AudioSegment.from_file("/path/to/another_sound.wav")

combined = sound1.overlay(sound2)

combined.export("/path/to/combined.wav", format="wav")

A python solution which requires both numpy and audiolab, but is fast and simple:

import numpy as np
from scikits.audiolab import wavread

data1, fs1, enc1 = wavread("file1.wav")
data2, fs2, enc2 = wavread("file2.wav")

assert fs1 == fs2
assert enc1 == enc2
result = 0.5 * data1 + 0.5 * data2

If sampling rate (fs*) or encoding (enc*) are different, you may need some audio processing (the assert are strictly speaking too strong, as wavread can handle some cases transparantly).


import librosa
import IPython as ip

y1, sample_rate1 = librosa.load(audio1, mono=True)
y2, sample_rate2 = librosa.load(audio2, mono=True)

librosa.display.waveplot((y1+y2)/2, sr=int((sample_rate1+sample_rate2)/2))

ip.display.Audio((y1+y2)/2, rate=int((sample_rate1+sample_rate2)/2))

this is very dependent of the format these are in. Here’s an example of how to do it assuming 2 byte wide, little-endian samples:

import wave

w1 = wave.open("/path/to/wav/1")
w2 = wave.open("/path/to/wav/2")

#get samples formatted as a string.
samples1 = w1.readframes(w1.getnframes())
samples2 = w2.readframes(w2.getnframes())

#takes every 2 bytes and groups them together as 1 sample. ("123456" -> ["12", "34", "56"])
samples1 = [samples1[i:i+2] for i in xrange(0, len(samples1), 2)]
samples2 = [samples2[i:i+2] for i in xrange(0, len(samples2), 2)]

#convert samples from strings to ints
def bin_to_int(bin):
    as_int = 0
    for char in bin[::-1]: #iterate over each char in reverse (because little-endian)
        #get the integer value of char and assign to the lowest byte of as_int, shifting the rest up
        as_int <<= 8
        as_int += ord(char) 
    return as_int

samples1 = [bin_to_int(s) for s in samples1] #['\x04\x08'] -> [0x0804]
samples2 = [bin_to_int(s) for s in samples2]

#average the samples:
samples_avg = [(s1+s2)/2 for (s1, s2) in zip(samples1, samples2)]

And now all that’s left to do is convert samples_avg back to a binary string and write that to a file using wave.writeframes. That’s just the inverse of what we just did, so it shouldn’t be too hard to figure out. For your int_to_bin function, you’ll probably what to make use of the function chr(code), which returns the character with the character code of code (opposite of ord)

You guys like numpy, no? Below is a solution that depends on wave and numpy. Raw bytes in two files ‘./file1.wav’ and ‘./file2.wav’ are added. It’s probably good to apply np.clip to mix before converting back to int-16 (not included).

import wave
import numpy as np
# load two files you'd like to mix
fnames =["./file1.wav", "./file2.wav"]
wavs = [wave.open(fn) for fn in fnames]
frames = [w.readframes(w.getnframes()) for w in wavs]
# here's efficient numpy conversion of the raw byte buffers
# '<i2' is a little-endian two-byte integer.
samples = [np.frombuffer(f, dtype="<i2") for f in frames]
samples = [samp.astype(np.float64) for samp in samples]
# mix as much as possible
n = min(map(len, samples))
mix = samples[0][:n] + samples[1][:n]
# Save the result
mix_wav = wave.open("./mix.wav", 'w')
# before saving, we want to convert back to '<i2' bytes:

Try the Echo Nest Remix API:

from echonest import audio
from util import *

def mixSound(fname1,fname2,f_out_name):

  f1 = audio.AudioData(fnem1)
  f2 = audio.AudioData(fnem2)

  f_out = audio.mix(f1,f2)
  f_out.encode(foutnem, True)

If it complains about codecs, check https://superuser.com/questions/196857/how-to-install-libmp3lame-for-ffmpeg.