Acquiring Data

## load libraries
import matplotlib.pyplot as plt
import matplotlib.animation
import numpy as np
import cmath as cm
from scipy import signal as sps
from IPython.display import HTML

Acquiring Data

What is a signal

In neuroscience, most signals are voltage varying in time. Many bio-signals are amplified and filtered electric potentials (ie. voltages), such as ECG, EEG, EMG, etc. Others might be the result of a transducer, a device converting different physical measurements (eg. distance, pressure) to (usually) voltage. A notable exception are all video based signals, such as eye tracking. These are usually recorded as some arbitrary units directly in digital form.

Analog to Digital

Most signals are measured as analog signals. This means they are continuous over time. Computers can't store or even process analog signals, we therefore use an analog to digital converter (ADC) which does what it says on the tin. This step is called discretization. From here on, continuous and analog as well as discrete and digital are going to be used interchangeably. Here is an example of a pure sine wave discretized at different sampling rates.

import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate


# We'll use this interpolation to rebuild an analogue signal from digital samples
# credit for this implementation goes to endolith: https://gist.github.com/endolith
def sinc_interp(x, s, u):
    """
    Interpolates x, sampled at "s" instants
    Output y is sampled at "u" instants ("u" for "upsampled")
    
    from Matlab:
    http://phaseportrait.blogspot.com/2008/06/sinc-interpolation-in-matlab.html        
    """
    # Validate arguments
    if len(x) != len(s):
        raise(Exception, 'x and s must be the same length')
    
    # Find the period    
    T = s[1] - s[0]    
    
    transposed = np.transpose(np.tile(np.arange(len(s)), (len(u),1))) * T    
    
    sincM = np.tile(u, (len(s), 1)) - transposed
    y = np.dot(x, np.sinc(sincM / T))
    
    return y
        

#Set up figure
fig, axes = plt.subplots(3,4,figsize=(12,4),sharex='col',sharey='row')

# The signal will be a pure sinusoidal, at a frequency of 3 Hz (Hertz, Hz = 1/s)
frequency = 3;
# A high sampling rate (number of recorded data points per time unit) will give the
# illusion of an analog signal, you will not be able to see the time steps.
# At a lower sampling rate, you can clearly see when a point was sampled
# If the sampling rate is at or below the Nyquist frequency, the acquired signal will be erroneous.

# Now lets discretize this with a sampling rate of 10 Hz

# We'll look at a dozen frequencies
sampling_rates = [50,25,12,7,6.2,6,5.8,5,4.5,4,3,2.9];#np.logspace(np.log10(96),np.log10(1.5),num=8);

# Compare all signals to a sampling rate of 50 Hz. Used by the interpolation
reference_sampling_rate = 50;
reference_time = np.arange(-10,10,1/reference_sampling_rate);
reference_time = np.append(reference_time,1);
reference_voltage = np.sin(2*np.pi*frequency*reference_time)

# Iterate over axes and sampling rates
for (axe,sampling_rate) in zip(axes.reshape(-1),sampling_rates):
    # Create the time vector.
    time = np.arange(-10.01,10,1/sampling_rate);
    
    # Compute the sinusoidal (x = Amplitude*sin(angular_frequency * time))
    # Where angular_frequency = 2 * pi * frequency
    voltage  =  np.sin(2*np.pi*frequency*time);
    
    # In order to get a better idea of the signal represented by the samples it is interpolated
    # and plotted on top of the discretized signal. this step is not part of discretization.
    # interpolate the signal using Whittaker-Shannon Interpolation
    interpolated_voltage = sinc_interp(voltage,time,reference_time);
    
    axe.stem(time,voltage,use_line_collection=True);
    axe.plot(reference_time,interpolated_voltage,'r');
    axe.plot(reference_time,reference_voltage,'g.')
    axe.set_title('Sampling Rate = {:05.2f} Hz'.format(sampling_rate))
    axe.set_xlim(0,1)
    

plt.tight_layout()

These plots show us that there is a qualitative change in the interpolated signal when the sampling rate drops below 6 Hz. While at 6 Hz the signal still looks more or less like the original (well, kinda), at 5.8 the interpolated signal suddenly is in antiphase. Even lower, completely new signals with much lower frequency appear. Notably, at 3 Hz (the fundamental frequency of the original signal) the signal completely flattens out to a DC (Direct Current) Signal. This makes sense, since each time we take a sample, the signal will be at exactly the same amplitude.

The Nyquist-Shanon Theorem

These effects are called aliasing and are caused by a sampling rate that does not comply with the Whittaker-Nyquist-Kotelnikov-Shannon theroem. There were multiple, independent, discoveries of this theorem. It is most commonly known as the Nyquist or the Nyquist-Shannon Theorem.

In the words of Claude Shannon:

If a function x(t) contains no frequencies higher than B Hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart

In other words, a signal is completely determined (ie. there is no information lost) if the sampling rate is more than twice as high as the highest frequency component present in the signal. This frequency of 2B is called the Nyquist rate (or frequency). In the example it is 6Hz since the highest frequency (and only) component is at 3 Hz.

A look at the plots shows that there are some issues: frequencies complying with the theorem but close to the Nyquist rate are still quite distorted. We use the best possible interpolation method: the sinc interpolation (named after the sinc function described above). But we have only sampled over a limited time window. Would we extend it to $\pm \infty$ we should be able to reconstruct the signal perfectly. In theory.

The point of these sampling theorems is not just to avoid losing information. At 6 and 3 Hz the signal flatlines, this is due to the sampling rate being at or at twice, the frequency of the signal. In this case, we sampled every time the signal passed through 0. So ignoring the theorem can lead to completely asinine results.

Here is an auditory example.

WARNING: check your sound volume: loud high pitched sounds

We'll use a perfect fifth harmony in two octaves:C7 (2093Hz) - F8 (2794Hz) and C8 (4186Hz) - F8 (5588Hz).

First, we'll sample it at 44100 Hz. This is the typical sample rate used for music.

import numpy as np
import simpleaudio as sa

# define a playback duration
note_length = 1
# define sampling rate
sr = 44100
# compute the frequencies of notes
f = [pow(2,(96-69)/12)*440,pow(2,(101-69)/12)*440,
    pow(2,(108-69)/12)*440,pow(2,(113-69)/12)*440]
# define some variable amplitudes for the notes
amplitudes = np.array([1,1,0.8,0.8])/len(f);
# compute time vector
t = np.arange(0,note_length,1/sr)
# compute audio signal
signal = np.sum(np.array([a*np.sin(2*np.pi*ff*t) for a in amplitudes for ff in f]),0)/len(f)
audio = signal * (2**15 -1)
audio = np.int16(audio)
play_obj = sa.play_buffer(audio,1,2,sr)
play_obj.wait_done()

Nothing special so far. now let's crank the sampling rate way down to 11025 Hz. This is typically what you would hear on a phone line

# define sampling rate
sr = 11025
# compute time vector
t = np.arange(0,note_length,1/sr)
# compute audio signal
signal = np.sum(np.array([np.sin(2*np.pi*ff*t) for ff in f]),0)/len(f)
audio = signal * (2**15 -1)
audio = np.int16(audio)
play_obj = sa.play_buffer(audio,1,2,sr)
play_obj.wait_done()

Even though nothing major happened, there is an issue with the higher notes. There is a distinctive electronic flair to it. This is the aliasing; due to the improper sampling rate, completely new frequencies are introduced to the signal.

Now let's go all the way down to 8000 Hz. You might encounter this on walkie-talkies.

# define sampling rate
sr = 8000
# compute time vector
t = np.arange(0,note_length,1/sr)
# compute audio signal
signal = np.sum(np.array([np.sin(2*np.pi*ff*t) for ff in f]),0)/len(f)
audio = signal * (2**15 -1)
audio = np.int16(audio)
play_obj = sa.play_buffer(audio,1,2,sr)
play_obj.wait_done()

Now the pitch is hardly identifiable. It sounds more like a modem from the 90's than a perfect fifth harmonic.