Instructors often rely on visual actions such as pointing, marking, and sketching to convey information in educational presentation videos. These subtle visual cues often lack verbal descriptions, forcing low-vision (LV) learners to search for visual indicators or rely solely on audio, which can lead to missed information and increased cognitive load. To address this challenge, we conducted a co-design study with three LV participants and developed VeasyGuide, a tool that uses motion detection to identify instructor actions and dynamically highlight and magnify them. VeasyGuide produces familiar visual highlights that convey spatial context and adapt to diverse learners and content through extensive personalization and real-time visual feedback. VeasyGuide reduces visual search effort by clarifying what to look for and where to look. In an evaluation with 8 LV participants, learners demonstrated a significant improvement in detecting instructor actions, with faster response times and significantly reduced cognitive load. A separate evaluation with 8 sighted participants showed that VeasyGuide also enhanced engagement and attentiveness, suggesting its potential as a universally beneficial tool.
@inproceedings{sechayk2025veasyguide,title={VeasyGuide: Personalized Visual Guidance for Low-vision Learners on Instructor Actions in Presentation Videos},author={Sechayk, Yotam and Shamir, Ariel and Pavel, Amy and Igarashi, Takeo},booktitle={Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '25)},publisher={Association for Computing Machinery},address={Denver, CO, USA},year={2025},doi={10.1145/3663547.3746372},language={english},}
ASSETS
Task Mode: Dynamic Filtering for Task-Specific Web Navigation using LLMs
Modern web interfaces are unnecessarily complex to use as they overwhelm users with excessive text and visuals unrelated to their current goals. This problem particularly impacts screen reader users (SRUs), who navigate content sequentially and may spend minutes traversing irrelevant elements before reaching desired information compared to vision users (VUs) who visually skim in seconds. We present Task Mode, a system that dynamically filters web content based on user-specified goals using large language models to identify and prioritize relevant elements while minimizing distractions. Our approach preserves page structure while offering multiple viewing modes tailored to different access needs. Our user study with 12 participants (6 VUs, 6 SRUs) demonstrates that our approach reduced task completion time for SRUs while maintaining performance for VUs, decreasing the completion time gap between groups from 2x to 1.2x. 11 of 12 participants wanted to use Task Mode in the future, reporting that Task Mode supported completing tasks with less effort and fewer distractions. This work demonstrates how designing new interactions simultaneously for visual and non-visual access can reduce rather than reinforce accessibility disparities in future technology created by human-computer interaction researchers and practitioners.
@inproceedings{mohanbabu2025taskmode,title={Task Mode: Dynamic Filtering for Task-Specific Web Navigation using LLMs},author={Mohanbabu, Ananya Gubbi and Sechayk, Yotam and Pavel, Amy},booktitle={Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '25)},publisher={Association for Computing Machinery},address={Denver, CO, USA},year={2025},doi={10.1145/3663547.3746401},language={english},}
ASSETS
A Longitudinal Autoethnography of Email Access for a Professional with Chronic Illness and ADHD: Preliminary Insights
Veronica Pimenova, Yotam Sechayk, Fabricio Murai, Andrew Hundt, and Shiri Dori-Hacohen
In Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’25), 2025
Email is a foundational infrastructure of professional environments, yet for chronically ill and neurodivergent individuals, it often becomes an invisible barrier to access. We share preliminary insights from a 14-year autoethnography of a professional with chronic illness and attention-deficit/hyperactivity disorder (ADHD). We detail this professional’s iterative adaptation of mainstream email features into Mail++, their personalized workplace communication workflow for managing executive function challenges and chronic illness flares. We propose three emerging themes: (1) from hacks to assistive technology, (2) evolving access needs, and (3) toll of inaccessible systems. Based on our findings, we present initial design insights for accessible workplace communication systems. As future work in this ongoing study, we discuss a more in-depth qualitative analysis of the autoethnographic data, and formal user testing of the Mail++ approach with a population of professionals with chronic illness and ADHD to better inform the design of assistive workplace technology.
@inproceedings{pimenova2025longitudinal,title={A Longitudinal Autoethnography of Email Access for a Professional with Chronic Illness and ADHD: Preliminary Insights},author={Pimenova, Veronica and Sechayk, Yotam and Murai, Fabricio and Hundt, Andrew and Dori-Hacohen, Shiri},booktitle={Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '25)},publisher={Association for Computing Machinery},address={Denver, CO, USA},year={2025},doi={10.1145/3663547.3759764},language={english},note={poster/demo},}
DIS
ImprovMate: Multimodal AI Assistant for Improv Actor Training
Improvisation training for actors presents unique challenges, particularly in maintaining narrative coherence and managing cognitive load during performances. Previous research on AI in improvisation performance often predates advances in large language models (LLMs) and relies on human intervention. We introduce ImprovMate, which leverages LLMs as GPTs to automate the generation of narrative stimuli and cues, allowing actors to focus on creativity without keeping track of plot or character continuity. Based on insights from professional improvisers, ImprovMate incorporates exercises that mimic live training, such as abrupt story resolution and reactive thinking exercises, while maintaining coherence via reference tables. By balancing randomness and structured guidance, ImprovMate provides a groundbreaking tool for improv training. Our pilot study revealed that actors might embrace AI techniques if the latter mirrors traditional practices, and appreciate the fresh twist introduced by our approach with the AI-generated cues.
@inproceedings{drago2025improvmate,author={Drago, Riccardo and Sechayk, Yotam and Dogan, Mustafa Doga and Sanna, Andrea and Igarashi, Takeo},title={ImprovMate: Multimodal AI Assistant for Improv Actor Training},year={2025},isbn={9798400714863},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3715668.3736363},doi={10.1145/3715668.3736363},booktitle={Companion Publication of the 2025 ACM Designing Interactive Systems Conference (DIS '25 Companion)},pages={526--532},numpages={7},language={english},note={work-in-progress},demo={https://tomfluff.github.io/ImprovMate/}}
SIGGRAPH
Confidence Estimation of Few-shot Patch-based Learning for Anime-style Colorization
In hand-drawn anime production, automatic colorization is used to boost productivity, where line drawings are automatically colored based on reference frames. However, the results sometimes include wrong color estimations, requiring artists to carefully inspect each region and correct colors—a time-consuming and labor-intensive task. To support this process, we propose a confidence estimation method that indicates the confidence level of colorization for each region of the image. Our method compares local patches in the colorized result and the reference frame.
@inproceedings{ji2025confidence,author={Ji, Yuexiang and Maejima, Akinobu and Sechayk, Yotam and Koyama, Yuki and Igarashi, Takeo},title={Confidence Estimation of Few-shot Patch-based Learning for Anime-style Colorization},year={2025},isbn={9798400715495},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3721250.3742964},doi={10.1145/3721250.3742964},booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Posters (SIGGRAPH Posters '25)},articleno={40},numpages={2},keywords={Automatic colorization, Line drawing, Confidence estimation},series={SIGGRAPH Posters '25},language={english},note={poster/demo},}
2024
INTERACTION
MyStoryKnight: A Character-drawing Driven Storytelling System Using LLM Hallucinations
Storytelling is a valuable tradition that plays a crucial role in child development, fostering creativity and a sense of agency. However, many children often consume stories passively, missing out on the opportunity to participate in the creative process. To address this, we propose a storytelling system that creates adventure-type stories with multiple branches that users can explore. We generate these interactive stories using a character drawing as input, with visual features extraction using GPT-4. By leveraging LLM hallucinations, we generate interactive stories using user feedback as a prompt. Finally, we refine the quality of the generated story through a complexity analysis algorithm. We believe that the use of a drawing as input further improves the engagement in the story and characters.
@inproceedings{sechayk2024mystoryknight,author={Sechayk, Yotam and Penarska, Gabriela A. and Randsalu, Ingrid A. and Cruz, Christian Arzate and Igarashi, Takeo},title={MyStoryKnight: A Character-drawing Driven Storytelling System Using LLM Hallucinations},booktitle={インタラクション2024論文集 (IPSJ INTERACTION 2024 Proceedings)},year={2024},month=feb,pages={1297--1300},publisher={情報処理学会 (IPSJ)},address={Japan},note={poster/demo},language={english},demo={https://tomfluff.github.io/MyStoryKnight/}}
CHI
SmartLearn: Visual-Temporal Accessibility for Slide-based e-learning Videos
In the realm of e-learning, video-based content is increasingly prevalent but brings with it unique accessibility challenges. Our research, beginning with a formative study involving 53 participants, has pinpointed the primary accessibility barriers in video-based e-learning: mismatches in user pace, complex visual arrangements leading to unclear focus, and difficulties in navigating content. To tackle these barriers, we introduced SmartLearn (SL), an innovative tool designed to enhance the accessibility of video content. SL utilizes advanced video analysis techniques to address issues of focus, navigation, and pacing, enabling users to interact with video segments more effectively through a web interface. A subsequent evaluation demonstrated that SL significantly enhances user engagement, ease of access, and learnability over existing approaches. We conclude by presenting design guidelines derived from our study, aiming to promote future efforts in research and development towards a more inclusive digital education landscape.
@inproceedings{sechayk2024smartlearn,author={Sechayk, Yotam and Shamir, Ariel and Igarashi, Takeo},title={SmartLearn: Visual-Temporal Accessibility for Slide-based e-learning Videos},year={2024},isbn={9798400703317},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3613905.3650883},doi={10.1145/3613905.3650883},booktitle={Extended Abstracts of the CHI Conference on Human Factors in Computing Systems},articleno={294},numpages={11},keywords={Accessibility, E-learning, Online learning, Temporal Accessibility, Universal Design, Video Accessibility, Visual Accessibility},location={Honolulu, HI, USA},language={english},note={late breaking work},series={CHI EA '24},}
ROMAN
Data Augmentation for 3DMM-based Arousal-Valence Prediction for HRI
Humans use multiple communication channels to interact with each other. For instance, body gestures or facial expressions are commonly used to convey an intent. The use of such non-verbal cues has motivated the development of prediction models. One such approach is predicting arousal and valence (AV) from facial expressions. However, making these models accurate for human-robot interaction (HRI) settings is challenging as it requires handling multiple subjects, challenging conditions, and a wide range of facial expressions. In this paper, we propose a data augmentation (DA) technique to improve the performance of AV predictors using 3D morphable models (3DMM). We then utilize this approach in an HRI setting with a mediator robot and a group of three humans. Our augmentation method creates synthetic sequences for underrepresented values in the AV space of the SEWA dataset, which is the most comprehensive dataset with continuous AV labels. Results show that using our DA method improves the accuracy and robustness of AV prediction in real-time applications. The accuracy of our models on the SEWA dataset is 0.793 for arousal and valence.
@inproceedings{sechayk2024data,author={Sechayk, Yotam and Cruz, Christian Arzate and Igarashi, Takeo and Gomez, Randy},booktitle={2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)},title={Data Augmentation for 3DMM-based Arousal-Valence Prediction for HRI},publisher={IEEE},year={2024},volume={},number={},pages={2015-2022},keywords={Solid modeling;Accuracy;Three-dimensional displays;Human-robot interaction;Predictive models;Feature extraction;Data augmentation;Data models;Robustness;Robots},doi={10.1109/RO-MAN60168.2024.10731438},language={english},}