A Case Study in Prototyping with Computer Vision
BirdID: Identify and Predict Birds in Your Area
By Andreas Vasegaard Nielsen, Madeline F. Lund Pedersen, Mariam El Ouaamari, Nicklas Lindberg Larsen & Thomas Lohse
Artificial Intelligence (AI) is everywhere these days. We see many different AI uses eg. “AI-infused products” or “AI-driven experiences” and the AI terminology is developing as fast as the technology itself. This machine intelligence is the base for a large amount of new technological products finding their way to the market, and the following interaction with users makes it relevant for researchers to investigate and develop AI products from a UX point of view.
In this article we explore and prototype our self-developed app; BirdID, as an AI-infused product. Through desk research of user needs, we place our idea in the market where we see an opportunity to fill a gap. The prototypes are user-tested in the early state, using wireframes and a functioning interactive prototype, to see if the users understand the idea and to identify possible improvement areas. We will finally elaborate on our findings and reflect on the process.
The purpose of this is to showcase how commonly known UX tools can be used in correlation with AI products and hereby inspire fellow design students to personally develop and conduct their own research within this area.
Defining User Needs
In the investigation on the market for bird prediction apps and their users, we have read through the comment section on the App Store and Play store to identify ways of good use and also potential struggles in some of these existing apps (Merlin Bird ID by Cornell Lab, Audubon Bird Guide, BirdsEye Bird Finding Guide). This research served as an inspiration for our app as we not only would like to be innovative but also wanted to be led by real user needs.
We found these basic functions, that we think works well and should be a part of our BirdID app:
- Identifying a bird with a smartphone camera
- General knowledge of the bird
- Keep track of your own birds
- Geographic specific birds
We found the following user needs that we thought could be incorporated into our BirdID app:
- Bird prediction based on geographic data, current or self-chosen
- A prediction or forecast of what could be seen in your area
- A bird log that keeps track of your birds and adds data to the AI
- Possibility to add birds manually to the bird log with search function and database access
With these general functions and needs in mind, we compared them to the general description of the apps and took this as inspiration to develop our concept further.
“…snap a photo of the viewfinder on your camera, and Merlin’s powerful AI will suggest an identification almost instantly.” (Merlin Bird ID by Cornell Lab, n.d.)
“…help you identify the birds around you, keep track of the birds you’ve seen, and get outside to find new birds near you.” (Audubon Bird Guide, n.d.)
Also, the iNaturalist app has been a key to inspire our development on the BirdID prototype. In the next section, we will describe the concept of the BirdID and dig more into the academic works that we used to map our concept. But in general, based on our research, we wanted to create an app that:
Can predict which birds are presented near you and help you identify any bird via photo or search in the database on characteristics. The app will present you with which kind of food the bird prefers and other information such as habitation and calls.
This app is different because the AI will tell you exactly which birds are most likely to see now and in the near future in your area based on location, weather, season and bird registrations from you and other users.
The Usage of BirdID
To classify this AI-driven experience we conducted a short mapping of the BirdID based on the framework by Kliman-Silver et al. (2020). The purpose of this framework is to classify three key dimensions of AI-driven experiences, this aiding researchers to ensure they are assessing AI experiences using appropriate methods (Kliman-Silver et al., 2020).
Before we make use of this framework, we will go through a use case scenario for BirdID:
Imagine that you are looking out your window and see a really cool bird sitting in a tree. You want to know what type of bird it is, so you pull out your phone and open the BirdID app. You take a picture of the bird through the app, and it tells you what type of bird it is, what type of feed it eats, and other tidbits of information regarding the bird.
You are also able to see what type of birds might appear in your neighbourhood in the coming week, and what feed to prepare if you would like certain types of birds to come near your home. [All predictions are based on geolocation, weather predictions, and your own entries to the app].
Now that we have established a shared understanding of the usage of BirdID we will look at it through the context of the three dimensions of the Kliman-Silver et al. (2020) framework.
Personal and Social Experiences
The first key dimension distinguishes between personal and social experiences. We have determined that our design belongs at the personal end of the scale, as it is a tool to help the user identify birds as well as provide a personal guide to birds in the area.
According to Kliman-Silver et al. (2020), it is important to measure the users’ attitudes towards explainability and trust when designing with AI. Trust is an especially important topic for our design; the app uses, among other sources, the data from every user to predict the probability of seeing species of birds. Therefore, anonymously collecting data on geolocation which most people would think of as personal information. To help the user build trust with the system, it is important to focus on explainability. Extensive explainability, which is how and to what extent the system explains its decisions, provides the user with an understanding of how the system is making its judgments, which will make them likely to trust it more (Kliman-Silver et al., 2020).
Discretionary vs. Non-discretionary Nature of the Experience
The second key dimension differentiates between the discretionary experience, a temporary, task-based, non-committal experience, and non-discretionary, which is more enduring and does not require opting in (Kliman-Silver et al., 2020). The experience of BirdID is discretionary because the interaction starts with a user opt-in action — opening the app and taking pictures of birds — and ends when the user finishes the task — closing the app.
Intelligent System’s Level of Independence Within the Interaction
The final dimension touches on the intelligent system’s level of independence within the interaction (Kliman-Silver et al., 2020). This dimension ranges from a passive tool, a reactive AI system responding to specific actions, to a proactive AI system that perceives user intent (Kliman-Silver et al., 2020). We have identified BirdID as being on the no independence part of the scale. There are both passive and reactive aspects of our design. The ability to provide the user with predictions for potential birds to spot in the near future is a passive tool. Identifying birds is, however, a reactive system responding to the specific action of taking a picture of said birds.
Designing the Product
We created an interactive digital prototype using Google’s Teachable Machine which allowed us to test specific core interactions. In addition to this, we created wireframes we could show to our test participants to help them understand the app design.
Using Figma we created the following wireframes to create a user journey map and visually show how this app could function in the use case scenario described earlier. As Yang et al. (2016) reflect upon, AI and Machine Learning wireframes should include user adaptation opportunities and considerations about how interaction flows change over time.
The BirdID app opens up with a home page, the user has to choose whether the app should use the current location or rely on a user input location. This part is very important because the app uses specific geolocation data to predict which birds will be in the user’s location.
The second view is the main page where most of the AI parts show. Here we see the prediction of three different birds, a prediction which is based on the user’s own earlier input of birds as well as other users’ input. The prediction here also takes weather and season into account.
That’s also why the BirdID app shows a calendar-like overview that informs the user of near-future predictions on new birds and where to find them near you.
In a case where a user sees a bird and either wants to know which type it is or just add it to their own bird log, the user can choose to use the AI-bird-recognition function via the explore button.
To identify a bird, the camera opens up and lets you take a picture. In this case, the AI easily identified that the user took a picture of a robin. This sharp finding relies again on several inputs from users, not only based on the looks of the bird but again on location, weather and season to ensure more specific identifications.
When the bird is identified the user gets easy access to basic knowledge on it e.g. family, size and feeding. The user can then save the bird to their log, and hereby also add it to the AI with all the data behind it.
In order to assure the data, maintain and teach the AI further, the user has an option to accept or reject the AI’s identification. By accepting, the user reassures the AI that it has identified the correct bird and by rejecting, the user informs the AI that the identification is wrong.
The Interactive Prototype
In addition to using wireframes, we wanted to create a prototype that uses the camera on a user’s device, to test what this meant for the interaction. We created a functioning interactive digital prototype that uses the visual input from the camera to distinguish between different birds. We used Google’s Teachable Machine to create our prototype — a tool that can be used in many ways when working with ML and allows for fast and simple ML prototyping. We trained it on four bird silhouettes and implemented the code on a website that mimics an application.
When running the prototype the user is prompted to enter their location which the app then uses to display current birds in the area. After this step, the app asks permission to use the camera of the user’s device which can then be used to distinguish between four different bird silhouettes.
This interactive prototype takes the core functionality of the design, the information about birds in the user’s area and the bird identification function, and lets us test these specific interactions on test subjects. This enabled us to test the concept with a high fidelity prototype regarding the interactivity and visual experience of using the app while being low-fidelity regarding the content of the application.
Using an interactive prototype in combination with wireframes, allowed us to focus on different fidelities. Where the interactive prototype has a high fidelity interactive experience and low fidelity content experience, the wireframes have a higher fidelity regarding content and lower fidelity regarding interactivity.
According to Nielsen Norman Group a high fidelity prototype has many benefits. Whereas slow response times can break users’ flow of interaction a high fidelity prototype allows for fast response times. High fidelity prototypes also look and feel like “live” software which means that test participants will behave more realistically as if they are interacting with a real system. Lastly, high fidelity prototypes frees the designers to focus on observing the interaction as opposed to controlling a low fidelity prototype such as would be the case in a Wizard of Oz test scenario (Pernice, 2016).
This video gives an insight into how our prototype functions. It demonstrates the use of our interactive prototype and walks through the wireframes as well.
For our tests, we recruited two of our classmates and conducted a media-go-along test, in which we gave them a brief introduction to the digital interactive prototype, familiarising them with the use scenario, before letting them use it and talk us through their thoughts while interacting with the prototype.
Afterwards, we showed them our Figma wireframes, and explained the idea of the app in greater detail. We created a shared understanding among us and our recruited classmates, based on both the prototype and the wireframes. We then moved to the final part of our tests, the questions.
We wanted to focus on trust, and comfort with the technology. As Kliman-Silver et al. (2020) point out in their article Adapting User Experience Research Methods for AI-Driven Experiences, it is important to understand how comfortable the user is with the AI technology — if the user isn’t comfortable, they are less likely to adapt to and use the technology. We don’t want the technology to feel creepy or ‘uncanny valley’-esque.
The data from user testing of the camera recognition function is the basis for the analysis. But before we go there, we would like to demonstrate how another AI part of the BirdID could be tested using Wizard of Oz. The Wizard of Oz-method, according to Buxton (2007), involves making a working system, where the user is unaware that some or all of the system’s functions are being performed by a human operator. The objective is not to make the actual system but to prototype something that users can experience, thereby enabling one to explore design concepts in action and as experienced far earlier in the process that would otherwise be possible (Buxton, 2007). Wizard of Oz testing is particularly useful for testing AI-based systems because the human who controls the computer can simulate the AI responses based on natural intelligence (Pernice, 2016).
We did not use the Wizard of Oz technique, as we felt it was unsuited for the type of interaction we wanted to convey. Lenz, Diefenbach, and Hassenzahl (2013) present the interaction vocabulary, which can be used to create a shared language to understand and analyse interaction with digital artefacts. Of particular importance in our work with BirdID was the attributes relating to interaction feedback. An instant interaction instills a sense of security and competency in the user, while a more delayed interaction puts emphasis on the act itself, rather than the results of the interaction (Lenz et al., 2013). An instant interaction was important for us, as the user needs to trust the system’s accuracy and a delayed response could make them doubtful of this.
A part of the BirdID which would have been ideal to test Wizard of Oz is the app’s ability to predict birds. This feature does not need to be instant and can easily be prepared by the wizard in advance. As described the BirdID relies on location, weather and season to predict future possible sights of birds. To showcase this we have these different opportunities to create specified inputs as if the AI was doing it itself.
Combined with pictures of each choice in the table above, the wizard should let a user pick a location herself (as if it was their current location), then the wizard randomly chooses a weather type (because the weather isn’t a user input), and finally the wizard selects which season it is.
The unique selection should then give access to a list of potential birds that could be seen in real life.
For example: Canada + Sunny + Fall = Canada goose, Sea eagle, and Red Headed woodpecker.
A change in e.g. weather should then provide a different list of birds.
This type of testing could provide us with important knowledge on the AI function as well as input from users on which parameters they think, if any, could provide with more information to the AI and thereby make it even more precise in it’s predictions.
This could also be a way to show a user how much data an AI needs to function well and it can lead to a talk about trust and use of personal data.
As a part of our user testing, we explored the issues of trust. They felt comfortable with the AI aspects of the concept because they had control over the interaction. In other words; because we decided to design a discretionary AI experience. They were also comfortable with sharing the location of identified birds with the app as long as they had full control of when they wanted to share said location. They suggested that the app should not share the precise location but an approximate location as it would make the app more private and anonymous.
During the testing of our interactive prototype, we experienced minor problems with the technology. The AI would only identify the birds if they were positioned in a very particular way. This made it cumbersome and almost impossible for the users as they had to scan the birds. This issue would, of course, be solved by creating a more thorough AI.
Both of our test participants found it odd that they were asked to confirm or reject the result of the computer vision-based bird recognition system. This confusion originated from our wireframes reusing the same picture taken by the user as the reference photo used for comparison afterwards. The intended interaction would not use a reference photo sourced by the user, but this was not explained by our wireframes. As such, the system did not make its functionality clear in accordance with the guidelines presented by Amershi et al. (2019). This was also true for our test of the prototype, where the participants wanted to freeze the image when the bird had been recognised by the system. We had split the technical part of the interaction, the ML prototype, from the aesthetic part of the interaction, covered by our wireframes. The coupling of these two parts of the test was not made clear to the participants beforehand. When we explained how the final interaction would involve photos taken on a mobile device, rather than a live webcam feed, their worries subsided.
Lastly, our testing also revealed another competing product, which serves the same needs as our design. Fuglebogen is an app published by the Danish ornithologist society (Dansk Ornitologisk Forening, 2017). One of the test participants pointed out how this app already served our intended functions for photo recognition and bird logging. They felt that our design was differentiated through predictions and forecasting. A further development for the app could focus on recognising not only bird species, but specific individual birds, inspired by the work of Ferreira et al. (2020).
As mentioned, we prioritised taking advantage of the opportunity to design with real AI technology as we found it fascinating. This meant we designed an interactive prototype using Google’s Teachable Machine, which resulted in a functioning prototype with limited scope. It was only able to recognise the four pre-identified birds and was also unable to adapt based on user input. Thus, the decision to create a prototype with functioning AI technology was at the expense of collecting more thorough data about the users’ experience with the concept and AI in general.
As designers, this process has taught us that it can be difficult to conduct thorough user tests when designing AI products. We designed our prototype to fit one specific use case with four predetermined birds. We did not account for other scenarios and thus did not explore the uniqueness of AI, how each interaction can be very different because the AI is continuously bettering itself based on different data entries. If we had decided to create a prototype that relied heavily on the Wizard of Oz technique, we could possibly have made the experience of interacting with the concept better reflect the desired, final product. Our focus was on the bird recognition system, rather than the more novel idea of forecasting and predicting which birds will be in your area.
Inspired by the guidelines created by Amershi et al. (2019), using the Wizard of Oz method could also make it possible for us to meet the guidelines 12 to 18. These guidelines explore among other things how a system works over time; how it adapts and updates based on user behaviour (Amershi et al., 2019). Thus, by using a wizard as the bird-predicting AI we could have collected more accurate feedback on the users’ experience of interacting with a functioning concept.
Because we had focused on exploring AI technology as a design material, we neglected the UX aspect of our project. Reexamining the structure of our project, how could we include UX more prominently in our work? Hertzum (2010) introduces six usability lenses for UX. The purpose of the six lenses provide the designer with the ability to approach usability from multiple points of view. Thus, making the designer aware of the various elements and aspects that impact the use of a system. Situational usability concerns the quality-in-use of a system in a specific context (Hertzum, 2010). We find this lens especially relevant for BirdID, as the context of use is highly dependent on the user’s bird-watching experience, and their specific situational needs. A seasoned ornithologist has different requirements for features and information compared to a user who wants to log which birds are nesting in their garden.
If we had conducted our work with the mindset of adapting BirdID to fit the lens of situational usability, we would have aligned our user testing and other research to the values of this lens. Thus, we would have had an approach that favoured the UX aspects of our AI-infused project, rather than the AI technology. E.g., using Wizard of Oz, as we have already theorised, to test both the user experience in greater detail, as well as how it fits within the lens of situational usability.
Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for human-ai interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Paper №3.
Audubon Bird Guide (n.d.). Google Play Store Reviews. Retrieved April 26 2021 from: https://play.google.com/store/apps/details?id=com.audubon.mobile.android&hl=en
Buxton, B. (2007). Sketching User Experiences: Getting the Design Right and the Right Design. San Francisco, USA: Morgan Kaufmann, pp. 105–119 & 139–147
Dansk Ornitologisk Forening (2017, January 31). Ny gratis app: Find fuglen på din mobil. Retrieved May 3 2021 from: https://www.dof.dk/om-dof/nyheder?nyhed_id=1528
Ferreira, AC, Silva, LR, Renna, F, et al. (2020). Deep learning‐based methods for individual recognition in small birds. Methods Ecol Evol. 2020(11). 1072–1085. https://doi.org/10.1111/2041-210X.13436
Hertzum, M. (2010) Images of Usability. International Journal of Human-Computer Interaction, vol. 26, no. 6 (2010), pp. 567–600. Preprint version.
Kliman-Silver, C., Siy, O., Awadalla, K., Lentz, A., Convertino, G., & Churchill, E. (2020). Adapting user experience research methods for AI-driven experiences. Conference on Human Factors in Computing Systems — Proceedings, 1–8.
Lenz, E., Diefenbach, S., & Hassenzahl, M. (2013, September). Exploring relationships between interaction attributes and experience. In Proceedings of the 6th international conference on designing pleasurable products and interfaces. 126–135.
Merlin Bird ID by Cornell Lab (n.d.). Google Play Store Reviews. Retrieved April 26 2021 from: https://play.google.com/store/apps/details?id=com.labs.merlinbirdid.app&hl=en
Pernice, K. (2016, December 18). UX Prototypes: Low Fidelity vs. High Fidelity. Nielsen Norman Group. https://www.nngroup.com/articles/ux-prototype-hi-lo-fidelity/
Yang, Q., Zimmerman, J., Steinfeld, A., & Tomasic, A. (2016). Planning adaptive mobile experiences when wireframing. DIS 2016. 565–576.