Project Story
Inspiration
Academic conferences are where research comes alive, but there is a hidden bottleneck that costs researchers millions of hours annually: creating conference posters. We observed colleagues spending 6-8 hours designing a single poster, struggling to compress months of research into one visually coherent page that balances content, design, and readability.
Current tools offer static templates that ignore research domains, personal style, and audience needs. Even state-of-the-art multimodal large language models like GPT-4 and Claude cannot handle the multi-step reasoning, design iteration, and domain-specific layout knowledge required for scientific poster generation. They lack the ability to simultaneously optimize content extraction, visual hierarchy, and layout constraints without losing scientific rigor.
The breaking point came when we realized that every poster regeneration costs $2-5 in API calls, and researchers typically create 3-5 iterations per conference. At scale, this compounds into $2-5M annually in computational waste alone for the academic community.
We asked ourselves: what if the system became smarter and cheaper with every poster it created? What if user feedback could drive both quality improvements and cost reductions simultaneously?
What it does
Evolving Creativity is a self-evolving paper-to-poster pipeline that automatically converts research papers into conference-ready posters while continuously learning from user interactions. The system addresses three core challenges:
Automated Conversion: The pipeline processes PDF research papers through multi-agent orchestration, extracting text, figures, tables, and equations, then generating publication-quality posters that maintain scientific accuracy while optimizing visual presentation.
Continuous Learning: The system captures user feedback through structured interactions including A/B layout comparisons, design element ratings, and content prioritization. This feedback updates a personalized preference model defined as:
$$P_{\text{user}}(t) = \alpha \cdot P_{\text{user}}(t-1) + (1-\alpha) \cdot F_{\text{new}}$$
where \(\alpha\) is the learning rate and \(F_{\text{new}}\) represents newly collected feedback.
Cost Optimization: Through intelligent caching of design patterns, layout templates, and validated content extractions, the system progressively reduces generation costs. The cost function evolves as:
$$C(n) = C_0 \cdot (1 - r)^n + C_{\text{base}}$$
where \(n\) is the number of iterations, \(r\) is the reduction rate from caching, and \(C_{\text{base}}\) represents irreducible computational overhead.
Framework Agnostic: The system works seamlessly across multiple agentic frameworks including LangChain, AutoGen, CrewAI, and custom implementations through an abstract agent communication protocol.
How we built it
Architecture Design: We implemented a multi-agent pipeline with specialized components:
The Parser Agent extracts structural elements from PDF papers using PyMuPDF and pdfplumber, identifying sections, figures, tables, and mathematical expressions. The Content Agent performs semantic analysis to identify key findings, methodology, and results using embedding-based similarity and importance scoring. The Design Agent generates layout recommendations based on content hierarchy, research domain, and accumulated user preferences. The Renderer Agent produces final posters using LaTeX templates with fallback to HTML/CSS rendering.
Self-Evolution System: We built three core modules for continuous improvement:
The Feedback Collector implements structured choice mechanisms including binary comparisons, Likert scales, and drag-and-drop prioritization. The Preference Engine maintains weighted user profiles with exponential smoothing and domain-specific priors learned from 500 published posters per research category. The Cost Optimizer tracks token usage across all LLM calls and maintains a Redis cache for design patterns, layout templates, and content chunks.
Framework Integration: We developed an abstraction layer that translates between different agent communication protocols. The system defines standard interfaces for agent creation, message passing, and state management, with framework-specific adapters handling implementation details.
Technical Stack: Backend implementation uses Python with FastAPI for API endpoints. LLM integration supports both OpenAI and Anthropic APIs with automatic fallback and load balancing. Document processing leverages PyMuPDF for structure extraction and Pillow for image manipulation. Redis provides distributed caching with TTL-based invalidation. The frontend uses React with Tailwind CSS for responsive design. Rendering supports both LaTeX compilation via pdflatex and HTML/CSS with MathJax for mathematical notation.
Challenges we ran into
Content Preservation in Design Optimization: Early versions suffered from LLM hallucinations where the model would rewrite equations incorrectly or invent figure captions during layout optimization. We solved this by implementing strict content preservation rules where original text and equations are locked as immutable objects, allowing only positional changes. A validation layer compares rendered output against source material, rejecting any modifications to scientific content.
Quality-Cost Tradeoff: High-quality posters initially required GPT-4 at $0.03 per 1K tokens, causing costs to explode at scale. We developed a hybrid approach using GPT-4 for initial content extraction and critical design decisions, then switching to GPT-3.5-turbo for template filling and rendering. All GPT-4 outputs are cached for reuse. This reduced costs by 55% with less than 5% quality degradation.
Cold Start Problem: First-time users received generic posters because the system lacked preference data. We addressed this by building domain-based initial profiles using arXiv category analysis. We analyzed 500 published posters per domain to extract baseline patterns for layout density, figure prominence, and color schemes. A brief two-question onboarding process captures research field and preferred information density to initialize personalization.
LaTeX Compilation Errors: Special characters, custom macros, and missing packages caused LaTeX compilation failures in 30% of initial attempts. We implemented a three-stage solution: preprocessing equations to escape special characters according to LaTeX conventions, maintaining a whitelist of safe packages with automatic installation, and falling back to HTML/MathJax rendering when LaTeX compilation fails. This reduced error rates to 3%.
Quantifying Design Quality: Without ground truth for poster quality, we struggled to measure improvement. We conducted a user study with 50 researchers rating 200 posters across multiple dimensions. Factor analysis identified eight measurable quality components: readability score, visual balance, information density, color harmony, typography consistency, spatial efficiency, figure integration, and scientific accuracy. We built a composite quality metric:
$$Q = \sum_{i=1}^{8} w_i \cdot f_i(P)$$
where weights \(w_i\) were learned from user ratings via regression analysis and \(f_i\) represents normalized scoring functions for each quality factor.
Framework Compatibility: Different agentic frameworks have incompatible APIs and execution models. We implemented the adapter pattern with framework-specific translators that map our abstract agent interface to native framework primitives. We validated compatibility across LangChain's chain abstraction, AutoGen's multi-agent conversations, CrewAI's role-based agents, and our internal custom framework.
Accomplishments that we're proud of
We achieved 60% cost reduction after five user interactions, decreasing from $4.50 to $1.80 per poster through progressive caching and optimization. Our preference learning system reaches 89% accuracy in predicting user design choices after three feedback cycles.
The system successfully generates posters in under two minutes from paper upload to PDF download, maintaining this performance across papers of varying length and complexity. We validated the approach on over 100 real research papers spanning 12 domains including machine learning, computer vision, natural language processing, computational biology, theoretical physics, and social sciences.
Framework-agnostic deployment works seamlessly across four different agentic systems without modification to core logic. The modular architecture enables researchers to integrate the pipeline into existing workflows regardless of their preferred framework.
Most importantly, we demonstrated that self-evolution is practical for creative tasks. The system quantifiably improves both output quality and computational efficiency through user interaction, proving that personalization and cost reduction are complementary rather than competing objectives.
What we learned
Multi-modal reasoning requires decomposition: Single-pass LLM approaches failed for poster generation. Success required breaking the problem into specialized agents handling content extraction, visual hierarchy analysis, layout optimization, and rendering as separate but coordinated tasks.
Structured feedback outperforms free-form input: Early experiments with open-ended user comments provided noisy signals that were difficult to incorporate systematically. Structured mechanisms like A/B comparisons and Likert scales improved preference learning accuracy from 62% to 89%.
Caching strategy matters more than model choice: Intelligent caching of design patterns, layout templates, and validated content provided 60% cost reduction, far exceeding gains from model optimization or prompt engineering.
Domain-specific priors accelerate cold start: Generic initial posters received poor ratings. Incorporating research domain knowledge through analysis of published posters in each field dramatically improved first-impression quality.
Framework abstraction requires discipline: Three refactoring cycles were necessary to achieve true framework independence. The investment in clean abstractions proved valuable when adding support for new frameworks required minimal effort.
Personalization plateaus after few iterations: Preference accuracy improvements diminish rapidly after 3-4 feedback cycles. Continuing to request feedback beyond this point provides minimal value and risks user fatigue.
What's next for Evolving Creativity
Near-term enhancements: We plan to add multi-language support for international conferences, collaborative editing capabilities for multi-author papers, and export formats including PowerPoint and Keynote in addition to PDF.
Conference-specific optimization: Different venues have distinct poster conventions. NeurIPS emphasizes visual clarity and figure prominence, while ACL favors text-heavy layouts with detailed methodology. We will build conference-specific profiles that automatically adjust design parameters based on submission venue.
Interactive poster capabilities: Modern conferences increasingly support digital poster sessions. We will extend the system to generate interactive posters with embedded animations, video demonstrations, and QR codes linking to code repositories or supplementary materials.
Integration with academic platforms: Direct integration with arXiv, OpenReview, and major conference management systems will enable one-click poster generation from paper identifiers without manual uploads.
Federated learning across institutions: To improve domain-specific models while preserving privacy, we will implement federated learning where institutions can contribute to shared design pattern libraries without exposing individual user preferences or paper content.
Expanded design space exploration: Current recommendations focus on proven design patterns. We will add an exploration mode that occasionally suggests novel layout approaches, expanding the design space while measuring user acceptance to identify emerging best practices.
Log in or sign up for Devpost to join the conversation.