
Image: fcafotodigital via Getty Images
2 min. read
In the race to develop AI that understands complex images like financial forecasts, medical diagrams, and nutrition labels—essential for AI to operate independently in everyday settings—closed-source systems like ChatGPT and Claude currently set the pace. But no one outside their makers knows how those models were trained or what data they used, leaving open-source alternatives scrambling to catch up.
Now, researchers at Penn Engineering and the Allen Institute for AI (Ai2) have developed a new approach to train open-source models: using AI to create scientific figures, charts, and tables that teach other AI systems how to interpret complex visual information.
Their tool, CoSyn (short for Code-Guided Synthesis), taps open-source AI models’ coding skills to render text-rich images and generate relevant questions and answers, giving other AI systems the data they need to learn how to “see” and understand scientific figures. The research is detailed in a paper for ACL 2025, a global AI conference.
“This is like taking a student who’s great at writing and asking them to teach someone how to draw, just by describing what the drawing should look like,” says Yue Yang, co-first author and research scientist at Ai2’s PRIOR: Perceptual Reasoning and Interaction Research group.
“Training AI with CoSyn is incredibly data efficient,” says Mark Yatskar, assistant professor in CIS and Yang’s doctoral co-advisor. “We’re showing that synthetic data can help models generalize to real-world scenarios that could be unique to a person’s needs, like reading a nutrition label for someone with low vision.”
By building CoSyn entirely with open-source tools, the researchers hope to democratize access to powerful vision-language training methods without the ethical and legal challenges surrounding web scraping and copyrighted content.
Read more at Penn Engineering Today.
Ian Scheffler
Image: fcafotodigital via Getty Images
nocred
Mia Levine and Michael Lampson’s research examines how telomere length is inherited, and how this can inform future genetic research in how cancer develops.
(Image: Courtesy of Getty/nopparit)
Image: Mininyx Doodle via Getty Images