Aware Alexa Project

UCLA ECE M202 Project, Winter 2019

By Sydney Hung and Michael Lo

Introduction

The aim of this project is to have personal assistants like Alexa to be aware of an user's state and respond accordingly. For our project, we created a system that is able interpret sensor reading and keep track of users' state. We demonstrate this by letting Alexa differentiate different users as they walked into the house, and then greet them accordingly.

Problem statement

Nowadays smart devices are often integrated together with cloud services. This allows an event to trigger something to happen. For example, when the door is opened, it can trigger the light to turn on. However, these systems are limited to a set of available sensor data, and unable to take into what happened in the past. Our project focused on a system that integrates multiple sensors with with an customized AWS server. So that is able to keep track of a set of finite states. And use the state to provide better context for a costumized speech response.

Prior work

Last August, Amazon released the Contact and Motion Sensor API for Alexa. This API allows Alexa enable device like Amazon Echo to query and manage smart sensors readings. However, before introduction of this API, smart devices manufacturers had to integrate their devices with their own cloud services for managing sensor devices. One popular smart device cloud service is SmartThings, developed by Samsung. SmartThings cloud services can query devices and notify users the status of their devices by sending notifications to the users phone. However, the service does not have a voice notification feature. Alexa Contact and Motion Sensor API offers the benefit of a speech interfaces as an addition to these cloud services. However, the APIs provided by Amazon and other smart device companies still lacks the capability to provide additional context of a change in status of a smart device.

Technical approach

Our approach to this project is to use a contract sensor as our door sensor and two Raspberry PIs as Bluetooth sensing devices. Then combining these sensor devices with a server build on top of the AWS lambda. Putting things in context, the door sensor will provide the status of a door (opened or closed); the Bluetooth sensing allows us to use RSSI readings to determine an user location (inside or outside the house); and the server provide the capability to manage sensor readings and store states of system.

Door Sensor

In our case the door sensor producted by Ecolink uses Z-Wave to communicate. As a result a Z-Wave compliant device was needed to retriving sensor data and we choose to use a SmartThings Hub for the project. The door sensor would now report status and some other information such as battery health to the SmartThings Hub. Then we direct this data to our backend by creating a custom SmartThings app that told the SmartThings Hub to send a selected door sensor’s open/close status to our AWS gateway by using HTTP POST. Once our gateway received a door sensors data, it would update our database which kept track of the state of the door.

Bluetooth Sensing

For user detection, we choose use Bluetooth sensing since most users carry around a smartphone that supports Bluetooth. Another benefit of Bluetooth is that its short ranged nature allow us to get distance readings within a few meters. This can help us narrow down where the user is with respect to the door. Our user detection system was done by bash scripting which ran on the two raspberry PIs. For the setup, one of the raspberry pi is the anchor which is mounted next to the door sensor. The other is placed inside the house to help determine if an user is leaving or entering a room. When first starting up the raspberry PIs, the PIs retrieve the Bluetooth MAC addresses stored in our database in AWS through HTTP POST. Once the addresses are retrieved, it would continuously poll for each of the devices associat the MAC addresses by attempting to connect. While it is attempting to connect, it will run another function to read the RSSI value. Once the connection timeout is reached, it sends the RSSI value to the server through HTTP POST.

Server

We use AWS Lambda for creating our server. The jobs of the server are (1) to receive data coming from the door sensor and raspberry PIs; (2) store the relevant data in the database, which uses AWS DynanmoDB; (3) manage and keep track of users' states; and (4) create a speech response and signal an output device to play the response.

The way our lambda server works is that we enabled the AWS API Gateway, which allow SmartThings hub and raspberry PIs to sent readings via HTTP POST. Once a door state data is recieved, the server updates the door state in the database. If a RSSI reading from our anchor raspberry PI at the door is recieved, the server updates the lastest RSSI value from the ancher in the database, and use the RSSI to determine whether the user is near the door. If a RSSI reading from the inner raspberry PI is recieved, the server checks the historial RSSI values to see if the user has just entered or leaved the house. If the server believes the user indeed entered or leaved the house, it send a costumized message in the form of text to AWS Polly, which synthesizes speech based on the costumized message, and the audio file produced by Polly is post on AWS S3, where a raspberry PI can then download and play it to the user.

Database

Our database is created using AWS DynanmoDB, which is a NoSQL database. The way the database is organized is that one row is dictated to the state of the door (open/close), the reset of the rows is for storing user data. Each user has his/her own row. Each row contain information such as user info, RSSI reading from both raspberry PIs, and the location where the server think the user is.

Alexa Skill

Finally, we also created an Alexa skill which can be used to change users' information and perference in the datebase. The app allow user to change their name, MAC address, and to turn on/off greetings.

Experimental methods

We created some scenarios that are most likely to happen. Our experiment expectation is that an user has to be transit through certain states for the correct response to happen. There are only two states that can trigger a voice response. Other states are invalid in which no response is generated.

Valid State 1:

The user is getting both >= 0 readings from the anchor and inner sensor and door is open. Then the user transitions by going through the door in which anchor still reads >= 0 but the inner now reads < 0. This will generate a goodbye message for that user and play it through a speaker.

Valid State 2:

The user is getting >= 0 from the anchor but is getting < 0 from the inner sensor and door is open. The user then walks through the door which causes both sensors to read >= 0. This generates a welcome message for that user and play it through the speaker.

Analysis and results

Our system performed as expected. When an user opens the door and walks through it, a message is generated and played depending on who it is and which way did the inner sensor’s reading changed to. Our system was able to perform with multiple users too. When one person was near the door and another walked through, a message was generated and played for the user that did walk through the door. When both users walked through, two separate messages were generated and played.

There were some cases which would trigger false message generations. The follow diagram illustrates the problem. zone 1 represents the area where inner sensor reads >= 0 RSSI values, zone 3 represents the area where anchor sensor reads >= 0 RSSI values, zone 2 is the overlapped area where both anchor and inner sensor reads >= 0, and zone 4 represents the area where both sensors receive < 0 RSSI values. The yellow line is the doorway. If the user from zone 4 entered into zone 2, then the server would think that an user entered if the door was in open status. Similarly, if an user went from zone 2 to zone 4, then the server would have thought that the user left the room.

One major problem when collecting our results was that the RSSI values being read were affected by noise. Readings were inconsistent and sometimes it would think the user is in a different zone because of those readings. For example, the user is in zone 2, but the inner sensor read a negative RSSI value, putting the user in zone 3 instead. This also made our logic handling heavily dependent on the inner sensor’s placement with respect to the door. If it is too far there is no overlap, if it is too close, it will take longer to generate messages.

Our system also started to feel slow when a valid event happened. This is because of the sequential operations of the PI script. Since there were multiple users to scan, it did it sequentially. On top of that, for every user, it had to send the reading to the server and wait for the server’s command. This made it that the period to get back to one user’s reading dependent on number of Bluetooth MAC addresses to watch for.

Future directions

We would like to improve the Bluetooth sensing algorithm. The RSSI values being read can be impacted by interference which we saw during our testing. This interference can cause our server’s logic to misinterpret whether an user is near the door not. One way is to adjust the transmit power of the Bluetooth hardware so that the user phone sees an attempt to connect at a much smaller radius. This means if the phone is in that smaller radius, we are able to determine if an user is near the door or not.

Instead of using range sensing, we could utilize image recognition. We could have a device recognize a user’s face and then tell the server who it is that walks through the door. Then we can use machine learning to determine whether an user has actually walked through the door. This would let the server know whether an user is entering or exiting a room.

Another improvement we would like to implement is response time. Currently, the system gets slower as more users are added because our system handles tasks in a sequence. This means take it will take longer to get a reading from the same user as it has to go through the other users. One way to fix this is that for every MAC address, we could spawn a new bash shell to run it in parallel. There is also some delay on the server side as we generate sound files every time and download it when it only needs to be done once.