Some stuff about the design process here
Trivory hosts a majority of announcement-based communication from school administrators to students, parents, and other interested parties for Portland Public Schools. Despite this, many schools opt to retain newsletters and email communications as supplements to their Trivory communications. This means administrators are making the same announcements up to three times, bloating their communications and causing needless extra work for already busy people. To gain a stronger appeal to its main clientele, Trivory must ensure it decreases workloads on the administration who are paying for licenses while reliably communicating news they want the student body to hear. With this context, Terren Gurule, founder and CEO of Trivory, approached me to help them explore algorithmic improvements to Trivory's newsfeed that will motivate more students and parents to download and use the Trivory app and encourage administrators to simplify communication to Trivory. We quickly formulated a problem statement before exploring solution methodologies:
How might we make administrators feel more comfortable using Trivory as the sole platform for their communication?
From there, we validated our concerns before starting to brainstorm approaches. Terren had already heard some feedback from administrators that indicated they felt uncomfortable phasing out newsletters and emails for two major reasons. First, Trivory sends out notifications to all students and parents from that school when a new post is made. Administrators don't want to cause notification saturation, so they try to only use Trivory for more important communications. Second, not all students and parents in the district have or use the Trivory app. To reach this demographic, they must use other avenues. These problems imply two features not present in Trivory: a judicious notification system and a method to send Trivory news to users outside of the Trivory system. This case study is focused on the implementation of the prior.
To validate these concerns and potential solutions, we conducted a survey of current Trivory users
"I would like an option to turn off sports notifications"
"I wish you could get notifications depending on the clubs you are in"
"When downloading the app, Trivory should ask what clubs and activities you are a part of, so you don't get notified about things that aren't relevant"
These results showed us that there is demand for reduced notifications in the form of summaries mostly for parents, who might not need to know about everything immediately. Contrary to our expectation, a minority of respondents found the current notification frequency problematic. Combined with feedback from administrators, we theorize that this is because post frequency is already being artificially throttled by the discretion of the administrators. Moving forward, we decided to approach the problem from a perspective of providing options and letting users decide how they should be notified.
To predict whether a user should see a post, we want to be able to answer two main questions about the post: who and what is this post about? We decided to call answers to these two questions "topics." With these, we can decide whether a user should or shouldn't see the post.
To generate good personalized summaries, we need to know what types of posts users want to see.
While topics give us the exact information we need for sorting, generating topics consistently and accurately is a difficult task, even for a human. We want topics to be accurate, precise, and relevant.
One of the biggest hurdles to overcome was that many posts on Trivory have important information present only in images. While we can easily use an OCR tool to extract some of that information, an unintelligent topic generation system will still struggle with accuracy. For instance, if the words "example text" are aligned vertically, the OCR might output "example" and "text" far away from each other. A system that searches for "example text" in the body of the post will then fail to give the post the "example text" topic.
To give our system intelligence, we decided to leverage the power of LLMs. Currently available LLMs with APIs have various levels of accuracy, precision, and relevance. Performance of each is highly dependent on prompt. All were given similar prompts but prompt compatibility could be responsible for some issues.
To generate reliable responses, we need to restrict the LLM's output to a set of acceptable topics. This limits the amount of variance in the outputs of our LLMs. However, topics are inherently variable. If a new club is established, it needs its own topic. But, it is hard to maintain relevance and precision if a diverse group of administrators are all given the right to add topics as they see fit.
We decided to test a system that allows administrators to approve or deny topics recommended by our LLMs. With this, we can expect the LLM not to suggest new topics that are already present in one form or another on the topics list, but to add some flexibility for new ones.
Now that we know what posts are about, we need to figure out what types of posts users want to see.
Since feedback wasn't a part of Trivory's design until this point, we had two jobs: add a post feedback mechanic that fits the existing identity of Trivory, and implement an algorithm that can use feedback to generate user profiles. After doing research on other platforms with feeds, we identified three commonly used methods used to solicit feedback from users.
We can associate behavior on these three frameworks with a numeric amount of interest in a given topic, based on some weighted average of their responses to posts in that topic.
But, data is sparse and responses are not very specific. How can we improve our understandings of which topics our users value most? With co-citation filtering, we should be able to predict user interests based on patterns found with interests of other users.
Topics are a helpful paradigm for manual sorting, and transparency is a core value of the Trivory brand. So, we wanted to explore how topics could fit into the Trivory ecosystem.
The topic system comes with two main problems:
- Determining Scope
- Filtering
To illustrate these problems, let's set a scenario using topics. We have user who absolutely loves girl's soccer, and wants to attend every girl's soccer match this year. However, the user also is uninterested in boy's soccer. We can expect a girl's soccer match announcement to include topics like sports, soccer, and girl's soccer. Similarly, a boy's soccer match announcement should include topics like sports, soccer, and boy's soccer. If our user indicates they don't want to see boy's soccer posts, the naive approach indicates we should remove all posts tagged boy's soccer from their feed.
However, what if the school schedules a double-header, with a match from both the boy's and girl's soccer teams? This post will have tags like sports, soccer, girl's soccer, and boy's soccer. If we ignore all posts tagged boy's soccer, we'll miss this post, and our user will fail to attend a girl's soccer event. Instead, our sorting algorithm should recognize that the post tagged with both girl's and boy's soccer has something our user wants to see.
Our team defined two questions our algorithm needs to answer to solve this situation correctly. "Which topics add new information to a post?" and "Does this post contain any desired information?"
To answer these, we established the idea of topic hierarchies. Topics like "boy's soccer" imply other topics, such as "sports" and "soccer." We define parent-child relationships between such topics. In this case, we can say "sports" is the parent of "soccer," which is in turn the parent of "boy's soccer." Then, when determining which topics add new information to a post, we can check the topic hierarchy.
Since sports, soccer, and boy's soccer form a clear ancestry, we can safely say a post with all three is about boy's soccer alone, and filter it from our example user's feed. However, the girl's soccer topic also shares an ancestry tree with boy's soccer, as it is a shared child of "soccer." As a result, just traversing the hierarchy does not solve our problem.
The key here is direct lineage. Let's say our user instead didn't want any "soccer" posts. Our double-header announcement containing boy's and girl's soccer should still be ignored. In this case, the other tags are all direct ancestors or descendants of "soccer."
When boy's soccer is the undesired tag, we consider girl's soccer as a sibling of boy's soccer. Since they do not form a direct lineage, we can decide to show this post, as it has information about a relevant topic, girl's soccer.