D.A.W.N. - Digital Assistant for Wearable Neutronics (AI Assistant)

These instructions are currently in the "do this" phase. This "works for me" and I welcome your feedback.

Application Notes

OpenAI API - An OpenAI API key is required for the current implementation of the cloud AI using OpenAI. Getting an API key is currently beyond the scope of this document. Please see OpenAI's documentation for details.
If you do not wish to use cloud AI or you only want local command support, there is a flag to disable it.

Installation Notes (Required Software)

System Packages

sudo apt install libssl-dev

Cmake 3.27.1

tar xvf cmake-3.27.1.tar.gz
cd cmake-3.27.1
./configure --system-curl
make -j8
sudo make install

spdlog

git clone https://github.com/gabime/spdlog.git
cd spdlog
mkdir build && cd build
cmake .. && make -j8
sudo make install

espeak-ng (git)

Before we begin: sudo apt purge espeak-ng-data libespeak-ng1 speech-dispatcher-espeak-ng

git clone https://github.com/rhasspy/espeak-ng.git
cd espeak-ng
./autogen.sh
./configure --prefix=/usr
make -j8 src/espeak-ng src/speak-ng
make
sudo make LIBDIR=/usr/lib/aarch64-linux-gnu install

Onnxruntime (git)

git clone --recursive https://github.com/microsoft/onnxruntime
cd onnxruntime
./build.sh --config Release --use_cuda --cuda_home /usr/local/cuda-12.2 --cudnn_home /usr/lib/aarch64-linux-gnu --build_shared_lib --skip_tests --parallel $(nproc) --arm
cd build/Linux/Release/
sudo make install

piper-phonemize (git)

git clone https://github.com/rhasspy/piper-phonemize.git
cd piper-phonemize
1. cd src && cp ../../onnxruntime/include/onnxruntime/core/session/*.h .
2. cd ..
mkdir build && cd build
cmake ..
make
sudo make install

piper (git)

git clone https://github.com/rhasspy/piper.git
cd piper
make - You'll get some errors on copies at the end but it builds.

kaldi (git) (This is a REALLY long build process!)

sudo apt-get install sox subversion
sudo git clone -b vosk --single-branch --depth=1 https://github.com/alphacep/kaldi /opt/kaldi
sudo chown -R $USER /opt/kaldi
cd /opt/kaldi/tools
Edit Makefile. Remove -msse -msse2 from openfst_add_CXXFLAGS
make openfst cub (Note: -j# doesn't seem to work here.) LONG BUILD
./extras/install_openblas_clapack.sh
cd ../src
./configure --mathlib=OPENBLAS_CLAPACK --shared
make -j 10 online2 lm rnnlm
cd ../..
sudo git clone https://github.com/alphacep/vosk-api --depth=1
sudo chown -R $USER vosk-api
cd vosk-api/src
KALDI_ROOT=/opt/kaldi make -j8
cd ../c
Edit Makefile. Add the following to LDFLAGS: $(shell pkg-config --libs cuda-12.2 cudart-12.2) -lcusparse -lcublas -lcusolver -lcurand
1. make
Choose a model:
1. wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
2. wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip
ln -s vosk-model-en-us-0.22 model
cp ../python/example/test.wav .
./test_vosk

Copy some files over for compiling

cp -r vosk-model-en-us-0.22 SOURCE_DIR
cp ../src/vosk_api.h ../src/libvosk.so SOURCE_DIR

Build DAWN

mkdir build
cd build
cmake ..
make

DAWN Application Configuration Documentation (`commands_config_nuevo.json`)

The DAWN application utilizes a sophisticated configuration file designed to enhance interactivity through local voice commands and actions. This documentation outlines the structure and purpose of each section within the file, focusing on how actions are defined and linked to specific devices, including audio settings customization.

Types and Actions

{
  "types": {
     "boolean": {
        "actions": {
           "enable": {
              "action_words": ["enable %device_name%", "turn on %device_name%", "switch on %device_name%", "show %device_name%", "display %device_name%", "open %device_name%", "start %device_name%"],
              "action_command": "{\"device\": \"%device_name%\", \"action\": \"enable\"}"
           },
           "disable": {
              "action_words": ["disable %device_name%", "turn off %device_name%", "switch off %device_name%", "hide %device_name%", "close %device_name%", "stop %device_name%"],
              "action_command": "{\"device\": \"%device_name%\", \"action\": \"disable\"}"
           }
        }
     }
  }
}

types: Represent the different categories of settings that can be adjusted or monitored within the DAWN system. These include boolean for toggle settings, analog for value-based adjustments, getter for retrieving information, and music for controlling audio playback.
actions: Defined within each type, actions describe what operations can be performed. Each action has associated action_words, which are the voice commands recognized by DAWN to trigger the action, and an action_command, the MQTT JSON string sent to the target device to execute the action.

Devices

This section lists the various devices controlled by DAWN, detailing how voice commands translate into specific actions for each device:

type: Links the device to one of the defined types (e.g., boolean, analog), dictating the nature of its control.
aliases: Alternative names or phrases that can also refer to the device, enhancing the system's ability to recognize voice commands intended for it.
topic: The MQTT topic the device publishes to, ensuring that commands are accurately directed in the network.

Audio Devices

Specific to the configuration of audio input and output devices, this section allows DAWN to correctly setup and utilize audio hardware. This is independent of the rest of the configuration.

Each audio device is categorized by its function (e.g., microphone, headphones, speakers), with detailed configurations for effective operation.

type: Identifies the role of the audio device within the system (e.g., audio capture device for microphones).
aliases: Provides additional identification terms for each device, facilitating user interaction.
device: The system identifier for the hardware, used by DAWN to apply the correct settings.

  "audio devices": {
     "microphone": {
        "type": "audio capture device",
        "aliases": ["mic", "helmet mic", "audio input device"],
        "device": "alsa_input.usb-Creative_Technology_Ltd_Sound_Blaster_Play__3_00128226-00.analog-stereo"
     },
     "headphones": {
        "type": "audio playback device",
        "aliases": ["helmet"],
        "device": "combined"
     },
     "speakers": {
        "type": "audio playback device",
        "aliases": ["speaker", "loud speakers", "loud speaker", "chest speaker"],
        "device": "combined"
     }
  }

Hints:

pactl list short sinks
pactl list short sources
Set your audio devices: commands_config_nuevo.json

Run DAWN

./dawn

Credits

Initial adaptation from the piper project: https://github.com/rhasspy/piper

Piper and the language models are covered under the MIT license. Vosk Licensed under the Apache License, Version 2.0.