D.A.W.N. - Digital Assistant for Wearable Neutronics (AI Assistant)
These instructions are currently in the "do this" phase. This "works for me" and I welcome your feedback.
Application Notes
- OpenAI API - An OpenAI API key is required for the current implementation of the cloud AI using OpenAI. Getting an API key is currently beyond the scope of this document. Please see OpenAI's documentation for details.
- If you do not wish to use cloud AI or you only want local command support, there is a flag to disable it.
Installation Notes (Required Software)
System Packages
sudo apt install libssl-dev
Cmake 3.27.1
tar xvf cmake-3.27.1.tar.gz
cd cmake-3.27.1
./configure --system-curl
make -j8
sudo make install
spdlog
git clone https://github.com/gabime/spdlog.git
cd spdlog
mkdir build && cd build
cmake .. && make -j8
sudo make install
espeak-ng (git)
Before we begin:
sudo apt purge espeak-ng-data libespeak-ng1 speech-dispatcher-espeak-ng
git clone https://github.com/rhasspy/espeak-ng.git
cd espeak-ng
./autogen.sh
./configure --prefix=/usr
make -j8 src/espeak-ng src/speak-ng
make
sudo make LIBDIR=/usr/lib/aarch64-linux-gnu install
Onnxruntime (git)
git clone --recursive https://github.com/microsoft/onnxruntime
cd onnxruntime
./build.sh --config Release --use_cuda --cuda_home /usr/local/cuda-12.2 --cudnn_home /usr/lib/aarch64-linux-gnu --build_shared_lib --skip_tests --parallel $(nproc) --arm
cd build/Linux/Release/
sudo make install
piper-phonemize (git)
git clone https://github.com/rhasspy/piper-phonemize.git
cd piper-phonemize
cd src && cp ../../onnxruntime/include/onnxruntime/core/session/*.h .
cd ..
mkdir build && cd build
cmake ..
make
sudo make install
piper (git)
git clone https://github.com/rhasspy/piper.git
cd piper
make
- You'll get some errors on copies at the end but it builds.
kaldi (git) (This is a REALLY long build process!)
sudo apt-get install sox subversion
sudo git clone -b vosk --single-branch --depth=1 https://github.com/alphacep/kaldi /opt/kaldi
sudo chown -R $USER /opt/kaldi
cd /opt/kaldi/tools
- Edit Makefile. Remove
-msse -msse2
fromopenfst_add_CXXFLAGS
make openfst cub
(Note: -j# doesn't seem to work here.) LONG BUILD./extras/install_openblas_clapack.sh
cd ../src
./configure --mathlib=OPENBLAS_CLAPACK --shared
make -j 10 online2 lm rnnlm
cd ../..
sudo git clone https://github.com/alphacep/vosk-api --depth=1
sudo chown -R $USER vosk-api
cd vosk-api/src
KALDI_ROOT=/opt/kaldi make -j8
cd ../c
- Edit Makefile. Add the following to LDFLAGS:
$(shell pkg-config --libs cuda-12.2 cudart-12.2) -lcusparse -lcublas -lcusolver -lcurand
make
- Choose a model:
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip
ln -s vosk-model-en-us-0.22 model
cp ../python/example/test.wav .
./test_vosk
Copy some files over for compiling
cp -r vosk-model-en-us-0.22 SOURCE_DIR
cp ../src/vosk_api.h ../src/libvosk.so SOURCE_DIR
Build DAWN
mkdir build
cd build
cmake ..
make
DAWN Application Configuration Documentation (commands_config_nuevo.json
)
The DAWN application utilizes a sophisticated configuration file designed to enhance interactivity through local voice commands and actions. This documentation outlines the structure and purpose of each section within the file, focusing on how actions are defined and linked to specific devices, including audio settings customization.
Types and Actions
{
"types": {
"boolean": {
"actions": {
"enable": {
"action_words": ["enable %device_name%", "turn on %device_name%", "switch on %device_name%", "show %device_name%", "display %device_name%", "open %device_name%", "start %device_name%"],
"action_command": "{\"device\": \"%device_name%\", \"action\": \"enable\"}"
},
"disable": {
"action_words": ["disable %device_name%", "turn off %device_name%", "switch off %device_name%", "hide %device_name%", "close %device_name%", "stop %device_name%"],
"action_command": "{\"device\": \"%device_name%\", \"action\": \"disable\"}"
}
}
}
}
}
types
: Represent the different categories of settings that can be adjusted or monitored within the DAWN system. These includeboolean
for toggle settings,analog
for value-based adjustments,getter
for retrieving information, andmusic
for controlling audio playback.actions
: Defined within each type, actions describe what operations can be performed. Each action has associatedaction_words
, which are the voice commands recognized by DAWN to trigger the action, and anaction_command
, the MQTT JSON string sent to the target device to execute the action.
Devices
This section lists the various devices controlled by DAWN, detailing how voice commands translate into specific actions for each device:
type
: Links the device to one of the defined types (e.g., boolean, analog), dictating the nature of its control.aliases
: Alternative names or phrases that can also refer to the device, enhancing the system's ability to recognize voice commands intended for it.topic
: The MQTT topic the device publishes to, ensuring that commands are accurately directed in the network.
Audio Devices
Specific to the configuration of audio input and output devices, this section allows DAWN to correctly setup and utilize audio hardware. This is independent of the rest of the configuration.
Each audio device is categorized by its function (e.g., microphone
, headphones
, speakers
), with detailed configurations for effective operation.
type
: Identifies the role of the audio device within the system (e.g., audio capture device for microphones).aliases
: Provides additional identification terms for each device, facilitating user interaction.device
: The system identifier for the hardware, used by DAWN to apply the correct settings.
"audio devices": {
"microphone": {
"type": "audio capture device",
"aliases": ["mic", "helmet mic", "audio input device"],
"device": "alsa_input.usb-Creative_Technology_Ltd_Sound_Blaster_Play__3_00128226-00.analog-stereo"
},
"headphones": {
"type": "audio playback device",
"aliases": ["helmet"],
"device": "combined"
},
"speakers": {
"type": "audio playback device",
"aliases": ["speaker", "loud speakers", "loud speaker", "chest speaker"],
"device": "combined"
}
}
Hints:
pactl list short sinks
pactl list short sources
- Set your audio devices:
commands_config_nuevo.json
Run DAWN
./dawn
Credits
Initial adaptation from the piper project: https://github.com/rhasspy/piper
Piper and the language models are covered under the MIT license. Vosk Licensed under the Apache License, Version 2.0.