D.A.W.N. - Digital Assistant for Wearable Neutronics (AI Assistant)
These instructions are currently in the "do this" phase. This "works for me" and I welcome your feedback.
Application Notes
- OpenAI API - An OpenAI API key is required for the current implementation of the cloud AI using OpenAI. Getting an API key is currently beyond the scope of this document. Please see OpenAI's documentation for details.
- If you do not wish to use cloud AI or you only want local command support, there is a flag to disable it.
Installation Notes (Required Software)
System Packages
sudo apt install libssl-dev
Cmake 3.27.1
tar xvf cmake-3.27.1.tar.gzcd cmake-3.27.1./configure --system-curlmake -j8sudo make install
spdlog
git clone https://github.com/gabime/spdlog.gitcd spdlogmkdir build && cd buildcmake .. && make -j8sudo make install
espeak-ng (git)
Before we begin:
sudo apt purge espeak-ng-data libespeak-ng1 speech-dispatcher-espeak-ng
git clone https://github.com/rhasspy/espeak-ng.gitcd espeak-ng./autogen.sh./configure --prefix=/usrmake -j8 src/espeak-ng src/speak-ngmakesudo make LIBDIR=/usr/lib/aarch64-linux-gnu install
Onnxruntime (git)
git clone --recursive https://github.com/microsoft/onnxruntimecd onnxruntime./build.sh --config Release --use_cuda --cuda_home /usr/local/cuda-12.2 --cudnn_home /usr/lib/aarch64-linux-gnu --build_shared_lib --skip_tests --parallel $(nproc) --armcd build/Linux/Release/sudo make install
piper-phonemize (git)
git clone https://github.com/rhasspy/piper-phonemize.gitcd piper-phonemizecd src && cp ../../onnxruntime/include/onnxruntime/core/session/*.h .cd ..
mkdir build && cd buildcmake ..makesudo make install
piper (git)
git clone https://github.com/rhasspy/piper.gitcd pipermake- You'll get some errors on copies at the end but it builds.
kaldi (git) (This is a REALLY long build process!)
sudo apt-get install sox subversionsudo git clone -b vosk --single-branch --depth=1 https://github.com/alphacep/kaldi /opt/kaldisudo chown -R $USER /opt/kaldicd /opt/kaldi/tools- Edit Makefile. Remove
-msse -msse2fromopenfst_add_CXXFLAGS make openfst cub(Note: -j# doesn't seem to work here.) LONG BUILD./extras/install_openblas_clapack.shcd ../src./configure --mathlib=OPENBLAS_CLAPACK --sharedmake -j 10 online2 lm rnnlmcd ../..sudo git clone https://github.com/alphacep/vosk-api --depth=1sudo chown -R $USER vosk-apicd vosk-api/srcKALDI_ROOT=/opt/kaldi make -j8cd ../c- Edit Makefile. Add the following to LDFLAGS:
$(shell pkg-config --libs cuda-12.2 cudart-12.2) -lcusparse -lcublas -lcusolver -lcurandmake
- Choose a model:
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zipwget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zipln -s vosk-model-en-us-0.22 modelcp ../python/example/test.wav ../test_vosk
Copy some files over for compiling
cp -r vosk-model-en-us-0.22 SOURCE_DIRcp ../src/vosk_api.h ../src/libvosk.so SOURCE_DIR
Build DAWN
mkdir buildcd buildcmake ..make
DAWN Application Configuration Documentation (commands_config_nuevo.json)
The DAWN application utilizes a sophisticated configuration file designed to enhance interactivity through local voice commands and actions. This documentation outlines the structure and purpose of each section within the file, focusing on how actions are defined and linked to specific devices, including audio settings customization.
Types and Actions
{
"types": {
"boolean": {
"actions": {
"enable": {
"action_words": ["enable %device_name%", "turn on %device_name%", "switch on %device_name%", "show %device_name%", "display %device_name%", "open %device_name%", "start %device_name%"],
"action_command": "{\"device\": \"%device_name%\", \"action\": \"enable\"}"
},
"disable": {
"action_words": ["disable %device_name%", "turn off %device_name%", "switch off %device_name%", "hide %device_name%", "close %device_name%", "stop %device_name%"],
"action_command": "{\"device\": \"%device_name%\", \"action\": \"disable\"}"
}
}
}
}
}
types: Represent the different categories of settings that can be adjusted or monitored within the DAWN system. These includebooleanfor toggle settings,analogfor value-based adjustments,getterfor retrieving information, andmusicfor controlling audio playback.actions: Defined within each type, actions describe what operations can be performed. Each action has associatedaction_words, which are the voice commands recognized by DAWN to trigger the action, and anaction_command, the MQTT JSON string sent to the target device to execute the action.
Devices
This section lists the various devices controlled by DAWN, detailing how voice commands translate into specific actions for each device:
type: Links the device to one of the defined types (e.g., boolean, analog), dictating the nature of its control.aliases: Alternative names or phrases that can also refer to the device, enhancing the system's ability to recognize voice commands intended for it.topic: The MQTT topic the device publishes to, ensuring that commands are accurately directed in the network.
Audio Devices
Specific to the configuration of audio input and output devices, this section allows DAWN to correctly setup and utilize audio hardware. This is independent of the rest of the configuration.
Each audio device is categorized by its function (e.g., microphone, headphones, speakers), with detailed configurations for effective operation.
type: Identifies the role of the audio device within the system (e.g., audio capture device for microphones).aliases: Provides additional identification terms for each device, facilitating user interaction.device: The system identifier for the hardware, used by DAWN to apply the correct settings.
"audio devices": {
"microphone": {
"type": "audio capture device",
"aliases": ["mic", "helmet mic", "audio input device"],
"device": "alsa_input.usb-Creative_Technology_Ltd_Sound_Blaster_Play__3_00128226-00.analog-stereo"
},
"headphones": {
"type": "audio playback device",
"aliases": ["helmet"],
"device": "combined"
},
"speakers": {
"type": "audio playback device",
"aliases": ["speaker", "loud speakers", "loud speaker", "chest speaker"],
"device": "combined"
}
}
Hints:
pactl list short sinkspactl list short sources- Set your audio devices:
commands_config_nuevo.json
Run DAWN
./dawn
Credits
Initial adaptation from the piper project: https://github.com/rhasspy/piper
Piper and the language models are covered under the MIT license. Vosk Licensed under the Apache License, Version 2.0.
