TL;DR: Key points in 30s

Paper2Slides is an innovative tool that significantly reduces the time for slide creation by utilizing Gemini 2.0 Flash and Marp. It realizes high-quality slide generation by combining large context input and Markdown-based styling.

  • Utilizes Gemini 2.0 Flash's large context to input entire papers
  • High-speed and beautiful slide generation with Marp (Markdown ecosystem)
  • Fast development with Next.js and Vercel
  • Prompt engineering specialized for slide structure

Introduction

In modern research and business, the speed of information gathering and sharing is more important than ever. However, the task of ” reading a paper and summarizing it into presentation slides ” remains one of the most time-consuming creative tasks.

To solve this problem, I developed Paper2Slides . This tool is an AI application that automatically generates beautiful slides just by uploading a paper’s PDF.

In this design document, I will share the technical background, architecture, and implementation details of Paper2Slides.

User

The goal of this project is not just “summarization”. It is to create an “assistant that completes up to the final output form (slides)” and allows humans to focus only on checking and presenting.

Project Overview

Concept

  • Zero Touch : Minimal user settings, immediate generation with one upload.
  • Beauty as Default : Beautiful slide designs by default using Marp themes.
  • Developer Friendly : Adopts an open architecture based on Markdown.

Core Technologies

LayerTechnology UsedRationale
FrontendNext.js (App Router)High DX and performance
StylingTailwind CSSEfficient UI development
Large Language ModelGemini 2.0 FlashLarge context window and speed
Slide EngineMarp / MarpitStandardization with Markdown
DeploymentVercelSeamless CI/CD

Architecture

The system consists of a simple yet robust serverless architecture.

1. Large Context PDF Processing

Paper2Slides utilizes Gemini 2.0 Flash . Since its context window is 1 million tokens, it is possible to input even very long papers or multiple related papers at once.

Conventionally, a method called RAG (Retrieval-Augmented Generation) was used to extract necessary parts and input them to AI. However, “for “structuring as slides”, it is often necessary to grasp the overall context of the paper, and Gemini”s large context provides a significant advantage.

2. Markdown-based Slide Generation

The slide engine adopts Marp . Marp is a tool that can transform Markdown into beautiful slides.

Why Marp?

  • Consistency : Since the output of the LLM is Markdown, it is highly compatible with AI.
  • Version Control : Slide content can be managed as text.
  • Styling : Can be easily customized using CSS.

Key Implementation Details

Prompt Engineering

The most important part of this application is the “prompt for slide generation”. I use the following system prompt to direct structure and design.

System Prompt Essentials
  • Role Specification : “You are a professional presentation designer.” - Structure Instruction : “Each slide must have one H1 heading and 3-5 bullet points.” - Format Instruction : “Output strictly according to the Marp Markdown format. Use --- for slide dividers.” - Content Instruction : “Extract key charts and equations from the paper and represent them properly.”

PDF Extraction and Text Preprocessing

A library like pdf-parse or a cloud-native OCR is used for PDF extraction. To optimize tokens for Gemini, unnecessary header/footer information and bibliography are removed as much as possible before input.

Real-time Preview

Uses @marp-team/marp-react on the frontend to provide a real-time preview of the generated Markdown. This allows the user to immediately check the quality while the AI is generating.

Challenges and Solutions

1. AI “Hallucination” (Mistakes in Output)

Problem : AI occasionally cites non-existent data or mistakenly summarizes. Solution : Implementation of citations. Explicitly instruct the AI to “Indicate which section the information is from,” increasing transparency for the user to double-check.

2. Difficulty in Styling

Problem : Generating complex layouts (e.g., 2 columns) with Markdown is difficult. Solution : Use Marp’s Directives. By embedding predefined CSS classes in the generated Markdown, it becomes possible to express complex layouts with LLM.

Future Roadmap

  1. Multi-language Support : Addition of translation functions during slide generation.
  2. Chart Generation : Integration with Mermaid.js for automatic generation of charts.
  3. Template Store : Allowing users to upload and share custom Marp themes.
  4. Integration : PowerPoint (PPTX) export support.

Conclusion

Paper2Slides is more than just an AI summarization tool; it is a “creative enhancer” for researchers and business people. By automating the mechanical part of converting information format, “humans can return to the essential task of “thinking and sharing”.

User

The future of AI tools lies in “output format integration”. Bridging the gap from text to visual presentation will significantly change human knowledge sharing.

How much did you understand the Paper2Slides design?

Q1. What is the biggest reason for adopting Gemini 2.0 Flash?

Q2. Which tool is used for the slide engine?

Q3. What is the recommended solution for the 'Hallucination' problem?