首頁資訊硬核奶爸做個(gè)“智能嬰兒監(jiān)視器”：啼哭自動通知，還能分析哭聲含義

硬核奶爸做個(gè)“智能嬰兒監(jiān)視器”：啼哭自動通知，還能分析哭聲含義

來源：泰然健康網(wǎng) 時(shí)間：2024年11月26日 16:20

來源：Medium、大數(shù)據(jù)文摘

編譯：陳之炎

作為一名新晉奶爸和程序員，我在新身份中最常思考的問題就是“照料嬰兒的工作真的無法自動化嗎？”

當(dāng)然，這也許能夠?qū)崿F(xiàn)，就算有給孩子換尿布的機(jī)器人（假設(shè)有足夠多的父母同意在自己蹣跚學(xué)步的孩子身上測試這樣的設(shè)備），愿意自動化照料嬰兒的父母還真為數(shù)不多。

作為父親，我首先意識到的事情是：嬰兒很多時(shí)候都會在哭，即使我在家，也不可能總是能聽到孩子的哭聲。

通常，商用嬰兒監(jiān)視器可以填補(bǔ)這一空白，它們充當(dāng)對講機(jī)，讓你在另一個(gè)房間也能聽到嬰兒的哭聲。

但我很快意識到：商用嬰兒監(jiān)視器沒有我想象中的理想設(shè)備智能：

它們只能充當(dāng)一個(gè)傳聲筒：把聲音從源頭帶到揚(yáng)聲器，卻無法發(fā)現(xiàn)孩子哭聲的含義；

當(dāng)家長要去到另一個(gè)房間里時(shí)，相應(yīng)要把揚(yáng)聲器帶到另一個(gè)房間，無法在任何其他現(xiàn)有的音頻設(shè)備上播放聲音；

揚(yáng)聲器通常是低功率揚(yáng)聲器，無法連接到外部揚(yáng)聲器-這意味著，如果我在另一個(gè)房間播放音樂，我可能會聽不到孩子的哭聲，即便監(jiān)控器和我在同一個(gè)房間也無法聽到；

大多數(shù)揚(yáng)聲器都是在低功率無線電波上工作的，這意味著如果嬰兒在他/她的房間里，而你必須走到樓下，它們才能工作。

因此，我萌生了自制一個(gè)更好用的“智能嬰兒監(jiān)視器”的想法。

說干就干，我先給這個(gè)“智能嬰兒監(jiān)視器”定義了一些需要的功能。

它可以運(yùn)行于價(jià)廉物美的樹莓派（RaspberryPI）與USB麥克風(fēng)。

當(dāng)孩子開始/停止哭泣時(shí)，它應(yīng)該檢測到孩子的哭聲，并通知我（理想情況下是在我的手機(jī)上），或者跟蹤我儀表板上的數(shù)據(jù)點(diǎn)，或者運(yùn)行相應(yīng)的任務(wù)。它不應(yīng)該是一個(gè)單純的對講器，簡單地將聲音從一個(gè)源傳遞到另一個(gè)兼容的設(shè)備。

它能夠在揚(yáng)聲器，智能手機(jī)，電腦等設(shè)備上傳輸音頻。

它不受源和揚(yáng)聲器之間距離的影響，無需在整個(gè)房子里將揚(yáng)聲器移來移去。

它還應(yīng)該有一個(gè)攝像頭，可以利用攝像頭對孩子實(shí)時(shí)監(jiān)控，當(dāng)他一開始哭，我便可以抓拍到圖片或嬰兒床的短視頻，以檢查有什么不對勁。

來看看一個(gè)新晉奶爸如何使用工程師的大腦和開源工具來完成這項(xiàng)任務(wù)吧。

采集音頻樣本

首先，購買一塊樹莓派（RaspberryPi），在SD卡上燒錄好Linux操作系統(tǒng)（建議使用RaspberryPI3或更高版本），運(yùn)行Tensorflow模型。還可以購買一個(gè)與樹莓派兼容的USB麥克風(fēng)。

然后安裝需要的相關(guān)項(xiàng)：

[sudo] apt-get install ffmpeg lame libatlas-base-dev alsa-utils[sudo] pip3 install tensorflow

第一步，必須記錄足夠的音頻樣本，嬰兒在什么時(shí)候哭，在什么時(shí)候不哭。稍后將利用這些樣本來訓(xùn)練音頻檢測模型。

注意：在這個(gè)例子中，我將展示如何利用聲音檢測來識別嬰兒的哭聲，同樣的精準(zhǔn)程序可以用來檢測任何其它類型的聲音-只要它們足夠長(例如：警報(bào)或鄰居家的鉆孔聲)。

首先，查看音頻輸入設(shè)備：

arecord -l

在樹莓派（RaspberryPI）上，得到以下輸出(注意，有兩個(gè)USB麥克風(fēng))：

**** List of CAPTURE Hardware Devices ****card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio] Subdevices: 0/1 Subdevice #0: subdevice #0card 2: Device_1 [USB PnP Sound Device], device 0: USB Audio [USB Audio] Subdevices: 0/1 Subdevice #0: subdevice #0

我利用第二個(gè)麥克風(fēng)來記錄聲音-即卡2，設(shè)備0。識別它的ALSA方法要么是hw：2，0（直接訪問硬件設(shè)備），要么是plughw：2，0（如果需要的話，它會輸入采樣率和格式轉(zhuǎn)換插件）。確保SD卡上有足夠的空間，然后開始錄制一些音頻：

arecord -D plughw:2,0 -c 1 -f cd | lame - audio.mp3

和孩子在同一個(gè)房間里，記錄幾分鐘或幾個(gè)小時(shí)的音頻-最好是長時(shí)間的沉默、嬰兒哭聲和其他與之無關(guān)的聲音-，錄音完成后按Ctrl-C。盡可能多的重復(fù)這個(gè)過程多次，在一天中的不同時(shí)刻或不同的日子里獲取不同的音頻樣本。

標(biāo)注音頻示例

一旦有了足夠的音頻樣本，就可以把它們復(fù)制到電腦上來訓(xùn)練模型了-可以使用SCP復(fù)制文件，也可以直接從SD卡上復(fù)制。

把它們都存儲在相同目錄下，例如：~/datasets/sound-detect/audio。另外，為每個(gè)示例音頻文件創(chuàng)建一個(gè)新文件夾，它包含一個(gè)音頻文件(名為audio.mp3)和一個(gè)標(biāo)注文件(名為labels.json)，利用它來標(biāo)記音頻文件中的負(fù)/正音頻段，原始數(shù)據(jù)集的結(jié)構(gòu)如下：

~/datasets/sound-detect/audio -> sample_1 -> audio.mp3 -> labels.json -> sample_2 -> audio.mp3 -> labels.json ...

下面：標(biāo)注錄制的音頻文件-如果它包含了孩子幾個(gè)小時(shí)的哭聲，可能會特別受虐。在你最喜歡的音頻播放器或Audacity中打開每個(gè)數(shù)據(jù)集音頻文件，并在每個(gè)示例目錄中創(chuàng)建一個(gè)新的label.json文件。確定哭泣開始的確切時(shí)間和結(jié)束時(shí)間，并在labels.json中標(biāo)注為time_string -> label的關(guān)鍵值結(jié)構(gòu)。例：

{ "00:00": "negative", "02:13": "positive", "04:57": "negative", "15:41": "positive", "18:24": "negative"}

在上面的例子中，00：00到02：12之間的所有音頻段將被標(biāo)記為負(fù)，02：13到04：56之間的所有音頻段將被標(biāo)記為正，以此類推。

生成數(shù)據(jù)集

對所有的音頻示例標(biāo)注完成之后，接下來是生成數(shù)據(jù)集，最后將它輸入到Tensorflow模型中去。首先，創(chuàng)建了一個(gè)名為micmon的通用庫和一組用于聲音監(jiān)視的實(shí)用工具。然后，開始安裝：

git clone git@github.com:/BlackLight/micmon.gitcd micmon[sudo] pip3 install -r requirements.txt[sudo] python3 setup.py build install

本模型設(shè)計(jì)基于音頻的頻率樣本而非原始音頻，因?yàn)?，在這里我們想檢測到一個(gè)特定的聲音，這個(gè)聲音有著特定的“頻譜”標(biāo)簽，即：基頻（或基頻下降的窄帶范圍）和一組特定的諧波。這些諧波頻率與基波之間的比率既不受振幅的影響（頻率比恒定，與輸入幅度無關(guān))，也不受相位的影響(無論何時(shí)開始記錄，連續(xù)的聲音都會有相同的頻譜特征）。

這種與振幅和相位無關(guān)的特性使得這種方法更有可能訓(xùn)練出一個(gè)魯棒的聲音檢測模型，而不是簡單地將原始音頻樣本饋送到模型中。此外，該模型可以更簡單（可以在不影響性能的情況下將多個(gè)頻率分為一組，從而可以有效地實(shí)現(xiàn)降維)，無論樣本持續(xù)時(shí)間多長，該模型將50~ 100個(gè)頻帶作為輸入值，一秒鐘的原始音頻通常包含44100個(gè)數(shù)據(jù)點(diǎn)，并且輸入的長度隨著樣本的持續(xù)時(shí)間而增加，并且不太容易發(fā)生過擬合。

micmon能計(jì)算音頻樣本某些段的FFT（快速傅里葉變換），將結(jié)果頻譜分為低通和高通濾波器的頻帶，并將結(jié)果保存到一組numpy壓縮(.npz)文件中?？梢酝ㄟ^在命令行上執(zhí)行micmon-datagen命令來實(shí)現(xiàn)：

micmon-datagen --low 250 --high 2500 --bins 100 --sample-duration 2 --channels 1 ~/datasets/sound-detect/audio ~/datasets/sound-detect/data

在上面的示例中，我們從存儲在~/dataset/sound-detect/audio下的原始音頻樣本生成一個(gè)數(shù)據(jù)集，并將生成的頻譜數(shù)據(jù)存儲到~/datasets/sound-detect/data. –low和~/datasets/sound-detect/data. --high中， low和high分別表示最低和最高頻率，最低頻率的默認(rèn)值為20Hz（人耳可聞的最低頻率），最高頻率的默認(rèn)值為20kHz（健康的年輕人耳可聞的最高頻率）。

通過對此范圍做出限定，盡可能多地捕獲希望檢測到的其他類型的音頻背景和無關(guān)諧波的聲音。在本案例中， 250-2500赫茲的范圍足以檢測嬰兒的哭聲。

嬰兒的哭聲通常是高頻的（歌劇女高音能達(dá)到的最高音符在1000赫茲左右)，在這里設(shè)置了至少雙倍的最高頻率，以確保能獲得足夠高的諧波(諧波是更高的頻率），但也不要將最高頻率設(shè)得太高，以防止其他背景聲音的諧波。我剪切掉了頻率低于250赫茲的音頻信號-嬰兒的哭聲不太可能發(fā)生在低頻段，例如，可以打開一些positive音頻樣本，利用均衡器/頻譜分析儀，檢查哪些頻率在positive樣本中占主導(dǎo)地位，并將數(shù)據(jù)集集中在這些頻率上。--bins指定了頻率空間的組數(shù)（默認(rèn)值：100），更大的數(shù)值意味著更高的頻率分辨率/粒度，但如果太高，可能會使模型容易發(fā)生過度擬合。

腳本將原始音頻分割成較小的段，并計(jì)算每個(gè)段的頻譜標(biāo)簽。示例持續(xù)時(shí)間指定每個(gè)音頻段有多長時(shí)間（默認(rèn)：2秒）。對于持續(xù)時(shí)間較長的聲音，取更大的值會起到更好的作用，但它同時(shí)會減少檢測的時(shí)間，而且可能會在短音上失效。對于持續(xù)時(shí)間較短的聲音，可以取較低的值，但捕獲的片段可能沒有足夠的信息量來可靠地識別聲音。

除了micmon-datagen腳本之外，也可以利用micmonAPI，編寫腳本來生成數(shù)據(jù)集。例：

import os

from micmon.audio import AudioDirectory, AudioPlayer, AudioFilefrom micmon.dataset import DatasetWriter

basedir = os.path.expanduser('~/datasets/sound-detect')audio_dir = os.path.join(basedir, 'audio')datasets_dir = os.path.join(basedir, 'data')cutoff_frequencies = [250, 2500]

# Scan the base audio_dir for labelled audio samplesaudio_dirs = AudioDirectory.scan(audio_dir)

# Save the spectrum information and labels of the samples to a# different compressed file for each audio file.for audio_dir in audio_dirs: dataset_file = os.path.join(datasets_dir, os.path.basename(audio_dir.path) + '.npz') print(f'Processing audio sample {audio_dir.path}')

with AudioFile(audio_dir) as reader, DatasetWriter(dataset_file, low_freq=cutoff_frequencies[0], high_freq=cutoff_frequencies[1]) as writer: for sample in reader: writer += sample

無論是使用micmon-datagen還是使用micmon Python API生成數(shù)據(jù)集，在過程結(jié)束時(shí)，應(yīng)該在~/datasets/sound-detect/data目錄下找到一堆.npz文件，每個(gè)標(biāo)注后的音頻原始文件對應(yīng)一個(gè)數(shù)據(jù)集。之后，便可以利用這個(gè)數(shù)據(jù)集來訓(xùn)練神經(jīng)網(wǎng)絡(luò)進(jìn)行聲音檢測。

訓(xùn)練模型

micmon利用Tensorflow+Keras來定義和訓(xùn)練模型，有了PythonAPI，可以很容易地實(shí)現(xiàn)。例如：

import osfrom tensorflow.keras import layers

from micmon.dataset import Datasetfrom micmon.model import Model

# This is a directory that contains the saved .npz dataset filesdatasets_dir = os.path.expanduser('~/datasets/sound-detect/data')

# This is the output directory where the model will be savedmodel_dir = os.path.expanduser('~/models/sound-detect')

# This is the number of training epochs for each dataset sampleepochs = 2

# Load the datasets from the compressed files.# 70% of the data points will be included in the training set,# 30% of the data points will be included in the evaluation set# and used to evaluate the performance of the model.datasets = Dataset.scan(datasets_dir, validation_split=0.3)labels = ['negative', 'positive']freq_bins = len(datasets[0].samples[0])

# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).# The first intermediate layer in this example will have twice the number of units as the number# of input units, while the second intermediate layer will have 75% of the number of# input units. We also specify the names for the labels and the low and high frequency range# used when sampling.model = Model( [ layers.Input(shape=(freq_bins,)), layers.Dense(int(2 * freq_bins), activation='relu'), layers.Dense(int(0.75 * freq_bins), activation='relu'), layers.Dense(len(labels), activation='softmax'), ], labels=labels, low_freq=datasets[0].low_freq, high_freq=datasets[0].high_freq)

# Train the modelfor epoch in range(epochs): for i, dataset in enumerate(datasets): print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]') model.fit(dataset) evaluation = model.evaluate(dataset) print(f'Validation set loss and accuracy: {evaluation}')

# Save the modelmodel.save(model_dir, overwrite=True)

運(yùn)行此腳本后（在對模型的準(zhǔn)確性感到滿意后），可以在~/models/sound-detect目錄下找保存的新模型。在我的這個(gè)例子中，我采集~5小時(shí)的聲音就足夠用了，通過定義一個(gè)較優(yōu)的頻率范圍來訓(xùn)練模型，準(zhǔn)確率大于98%。如果是在計(jì)算機(jī)上訓(xùn)練模型，只需將其復(fù)制到RaspberryPI，便可以準(zhǔn)備進(jìn)入下一步了。

利用模型進(jìn)行預(yù)測

這時(shí)候，制作一個(gè)腳本：利用以前訓(xùn)練過的模型，當(dāng)孩子開始哭的時(shí)候，通知我們：

import os

from micmon.audio import AudioDevicefrom micmon.model import Model

model_dir = os.path.expanduser('~/models/sound-detect')model = Model.load(model_dir)audio_system = 'alsa' # Supported: alsa and pulseaudio_device = 'plughw:2,0' # Get list of recognized input devices with arecord -l

with AudioDevice(audio_system, device=audio_device) as source: for sample in source: source.pause() # Pause recording while we process the frame prediction = model.predict(sample) print(prediction) source.resume() # Resume recording

在RaspberryPI上運(yùn)行腳本，并讓它運(yùn)行一段時(shí)間-如果在過去2秒內(nèi)沒有檢測到哭聲，它將在標(biāo)準(zhǔn)輸出中打印negative，如果在過去2秒內(nèi)檢測到哭聲否，則在標(biāo)準(zhǔn)輸出中打印positive。

然而，如果孩子哭了，簡單地將消息打印到標(biāo)準(zhǔn)輸出中并沒有太大作用-我們希望得到明確實(shí)時(shí)通知！

可以利用Platypush來實(shí)現(xiàn)這個(gè)功能。在本例中，我們將使用pushbullet集成在檢測到cry時(shí)向我們的手機(jī)發(fā)送消息。接下來安裝Redis(Platypush用于接收消息)和Platypush，利用HTTP和Pushbullet來集成：

[sudo] apt-get install redis-server[sudo] systemctl start redis-server.service[sudo] systemctl enable redis-server.service[sudo] pip3 install 'platypush[http,pushbullet]'

將Pushbullet應(yīng)用程序安裝在智能手機(jī)上，到pushbullet.com上以獲取API token。然后創(chuàng)建一個(gè)~/.config/platypush/config.yaml文件，該文件啟用HTTP和Pushbullet集成：

backend.http: enabled: Truepushbullet: token: YOUR_TOKEN

接下來，對前面的腳本進(jìn)行修改，不讓它將消息打印到標(biāo)準(zhǔn)輸出，而是觸發(fā)一個(gè)可以被Platypush hook捕獲的自定義事件CustomEvent：

#!/usr/bin/python3

import argparseimport loggingimport osimport sys

from platypush import RedisBusfrom platypush.message.event.custom import CustomEvent

from micmon.audio import AudioDevicefrom micmon.model import Model

logger = logging.getLogger('micmon')

def get_args(): parser = argparse.ArgumentParser() parser.add_argument('model_path', help='Path to the file/directory containing the saved Tensorflow model') parser.add_argument('-i', help='Input sound device (e.g. hw:0,1 or default)', required=True, dest='sound_device') parser.add_argument('-e', help='Name of the event that should be raised when a positive event occurs', required=True, dest='event_type') parser.add_argument('-s', '--sound-server', help='Sound server to be used (available: alsa, pulse)', required=False, default='alsa', dest='sound_server') parser.add_argument('-P', '--positive-label', help='Model output label name/index to indicate a positive sample (default: positive)', required=False, default='positive', dest='positive_label') parser.add_argument('-N', '--negative-label', help='Model output label name/index to indicate a negative sample (default: negative)', required=False, default='negative', dest='negative_label') parser.add_argument('-l', '--sample-duration', help='Length of the FFT audio samples (default: 2 seconds)', required=False, type=float, default=2., dest='sample_duration') parser.add_argument('-r', '--sample-rate', help='Sample rate (default: 44100 Hz)', required=False, type=int, default=44100, dest='sample_rate') parser.add_argument('-c', '--channels', help='Number of audio recording channels (default: 1)', required=False, type=int, default=1, dest='channels') parser.add_argument('-f', '--ffmpeg-bin', help='FFmpeg executable path (default: ffmpeg)', required=False, default='ffmpeg', dest='ffmpeg_bin') parser.add_argument('-v', '--verbose', help='Verbose/debug mode', required=False, action='store_true', dest='debug') parser.add_argument('-w', '--window-duration', help='Duration of the look-back window (default: 10 seconds)', required=False, type=float, default=10., dest='window_length') parser.add_argument('-n', '--positive-samples', help='Number of positive samples detected over the window duration to trigger the event (default: 1)', required=False, type=int, default=1, dest='positive_samples')

opts, args = parser.parse_known_args(sys.argv[1:]) return opts

def main(): args = get_args() if args.debug: logger.setLevel(logging.DEBUG)

model_dir = os.path.abspath(os.path.expanduser(args.model_path)) model = Model.load(model_dir) window = [] cur_prediction = args.negative_label bus = RedisBus()

with AudioDevice(system=args.sound_server, device=args.sound_device, sample_duration=args.sample_duration, sample_rate=args.sample_rate, channels=args.channels, ffmpeg_bin=args.ffmpeg_bin, debug=args.debug) as source: for sample in source: source.pause() # Pause recording while we process the frame prediction = model.predict(sample) logger.debug(f'Sample prediction: {prediction}') has_change = False

if len(window) < args.window_length: window += [prediction] else: window = window[1:] + [prediction]

positive_samples = len([pred for pred in window if pred == args.positive_label]) if args.positive_samples <= positive_samples and prediction == args.positive_label and cur_prediction != args.positive_label: cur_prediction = args.positive_label has_change = True logging.info(f'Positive sample threshold detected ({positive_samples}/{len(window)})') elif args.positive_samples > positive_samples and prediction == args.negative_label and cur_prediction != args.negative_label: cur_prediction = args.negative_label has_change = True logging.info(f'Negative sample threshold detected ({len(window)-positive_samples}/{len(window)})')

if has_change: evt = CustomEvent(subtype=args.event_type, state=prediction) bus.post(evt)

source.resume() # Resume recording

if __name__ == '__main__': main()

將上面的腳本保存為~/bin/micmon_detect.py。如果在滑動窗口時(shí)間內(nèi)上檢測到positive_samples樣本（為了減少預(yù)測錯(cuò)誤或臨時(shí)故障引起的噪聲），則腳本觸發(fā)事件，并且它只會在當(dāng)前預(yù)測從negative到positive的情況下觸發(fā)事件。然后，它被分派給Platypush。對于其它不同的聲音模型（不一定是哭泣嬰兒），該腳本也是通用的，對應(yīng)其它正/負(fù)標(biāo)簽、其它頻率范圍和其它類型的輸出事件，這個(gè)腳本也能工作。

創(chuàng)建一個(gè)Platypush hook來對事件作出響應(yīng)，并向設(shè)備發(fā)送通知。首先，創(chuàng)建 Platypush腳本目錄：

mkdir -p ~/.config/platypush/scriptscd ~/.config/platypush/scripts# Define the directory as a moduletouch __init__.py# Create a script for the baby-cry eventsvi babymonitor.py

babymonitor.py的內(nèi)容為：

from platypush.context import get_pluginfrom platypush.event.hook import hookfrom platypush.message.event.custom import CustomEvent

@hook(CustomEvent, subtype='baby-cry', state='positive')def on_baby_cry_start(event, **_): pb = get_plugin('pushbullet') pb.send_note(title='Baby cry status', body='The baby is crying!')

@hook(CustomEvent, subtype='baby-cry', state='negative')def on_baby_cry_stop(event, **_): pb = get_plugin('pushbullet') pb.send_note(title='Baby cry status', body='The baby stopped crying - good job!')

為Platypush創(chuàng)建一個(gè)服務(wù)文件，并啟動/啟用服務(wù)，這樣它就會在終端上啟動：

mkdir -p ~/.config/systemd/userwget -O ~/.config/systemd/user/platypush.service https://raw.githubusercontent.com/BlackLight/platypush/master/examples/systemd/platypush.servicesystemctl --user start platypush.servicesystemctl --user enable platypush.service

為嬰兒監(jiān)視器創(chuàng)建一個(gè)服務(wù)文件-如：

~/.config/systemd/user/babymonitor.service:

[Unit]Description=Monitor to detect my baby's criesAfter=network.target sound.target[Service]ExecStart=/home/pi/bin/micmon_detect.py -i plughw:2,0 -e baby-cry -w 10 -n 2 ~/models/sound-detectRestart=alwaysRestartSec=10[Install]WantedBy=default.target

該服務(wù)將啟動ALSA設(shè)備plughw：2，0上的麥克風(fēng)監(jiān)視器，如果在過去10秒內(nèi)檢測到至少2個(gè)positive 2秒樣本，并且先前的狀態(tài)為negative，則會觸發(fā)state=positive事件；如果在過去10秒內(nèi)檢測到少于2個(gè)positive樣本，并且先前的狀態(tài)為positive，則state=negative。然后可以啟動/啟用服務(wù)：

systemctl --user start babymonitor.servicesystemctl --user enable babymonitor.service

確認(rèn)一旦嬰兒開始哭泣，就會在手機(jī)上收到通知。如果沒有收到通知，可以檢查一下音頻示例的標(biāo)簽、神經(jīng)網(wǎng)絡(luò)的架構(gòu)和參數(shù)，或樣本長度/窗口/頻帶等參數(shù)是否正確。

此外，這是一個(gè)相對基本的自動化例子-可以為它添加更多的自動化任務(wù)。例如，可以向另一個(gè)Platypush設(shè)備發(fā)送請求(例如：在臥室或客廳)，用TTS插件大聲提示嬰兒在哭。還可以擴(kuò)展micmon_detect.py腳本，以便捕獲的音頻樣本也可以通過HTTP流-例如使用Flask包裝器和ffmpeg進(jìn)行音頻轉(zhuǎn)換。另一個(gè)有趣的用例是，當(dāng)嬰兒開始/停止哭泣時(shí)，將數(shù)據(jù)點(diǎn)發(fā)送到本地?cái)?shù)據(jù)庫(可以參考我先前關(guān)于“如何使用Platypush+PostgreSQL+Mosquitto+Grafana創(chuàng)建靈活和自我管理的儀表板”的文章https://towardsdatascience.com/how-to-build-your-home-infrastructure-for-data-collection-and-visualization-and-be-the-real-owner-af9b33723b0c)：這是一組相當(dāng)有用的數(shù)據(jù)，可以用來跟蹤嬰兒睡覺、醒著或需要喂食時(shí)的情況。雖然監(jiān)測寶寶一直是我開發(fā)micmon的初衷，但是同樣的程序也可以用來訓(xùn)練和檢測其它類型聲音的模型。最后，可以考慮使用一組良好的電源或鋰電池組，這樣監(jiān)視器便可以便攜化了。

安裝寶貝攝像頭

有了一個(gè)好的音頻饋送和檢測方法之后，還可以添加一個(gè)視頻饋送，以保持對孩子的監(jiān)控。一開始，我在RaspberryPI3上安裝了一個(gè)PiCamera用于音頻檢測，后來，我發(fā)現(xiàn)這個(gè)配置相當(dāng)不切實(shí)際。想想看：一個(gè)RaspberryPi 3、一個(gè)附加的電池包和一個(gè)攝像頭，組合在一起會相當(dāng)笨拙；如果你找到一個(gè)輕型相機(jī)，可以很容易地安裝在支架或靈活的手臂上，而且可以四處移動，這樣，無論他/她在哪里，都可以密切關(guān)注孩子。最終，我選擇了體積較小的RaspberryPi Zero，它與PiCamera兼容，再配一個(gè)小電池。