Skip to content

Chivier/YA-PapersWithCode

Repository files navigation

YA-PapersWithCode

Yet Another Papers With Code - A modern recreation of the Papers With Code platform

Important

πŸš€ Seeking Compute Sponsors for AI Agent Search

The advanced AI Agent Search feature requires significant computational resources. We are actively seeking compute sponsors to enable this feature on the public web version.

Current Status:

  • 🌐 Web Version: Agent Search is temporarily unavailable due to compute limitations
  • πŸ’» Self-Hosted: Full Agent Search functionality is available for local deployment
  • 🀝 Sponsorship Needed: GPU compute resources or cloud credits to enable public access

If you or your organization can provide compute resources, please open an issue with the label compute-sponsorship. Your support will enable free AI-powered search for the entire research community.

Note: Users can deploy their own instance with full Agent Search capabilities by following the deployment guide.

A comprehensive machine learning research platform that provides access to academic papers, datasets, methods, and state-of-the-art benchmarks. This project aims to restore and enhance the functionality of the original Papers With Code website using modern web technologies.

🌟 Overview

YA-PapersWithCode is a full-stack application that recreates the popular Papers With Code platform, which provided a valuable service to the ML research community before its discontinuation. The project consists of:

  • Modern React Frontend - Built with TypeScript, Tailwind CSS, and shadcn/ui components
  • FastAPI Backend - Powered by SQLite database with full-text search capabilities
  • AI-Powered Search - Advanced search agents for intelligent paper and dataset discovery
  • Data Pipeline - Automated downloading and processing of research data

✨ Key Features

πŸ” Intelligent Search

  • Multi-modal Search: Papers, datasets, methods, and benchmarks
  • AI-Powered Agents: Advanced search expansion using semantic understanding
  • Full-text Search: SQLite-based search with optimization for research content
  • Smart Filtering: Dynamic filters for modalities, tasks, languages, and more

πŸ“š Research Content

  • Papers: 50,000+ research papers with abstracts and code links
  • Datasets: Comprehensive dataset catalog with 39 modalities and 500+ tasks
  • Methods: Organized ML methods across domains (CV, NLP, RL, Audio)
  • Benchmarks: State-of-the-art leaderboards and evaluation results

🎨 Modern Interface

  • Responsive Design: Mobile-first approach with adaptive layouts
  • Real-time Filtering: Instant search results with dynamic filters
  • Beautiful UI: Modern design inspired by shadcn/ui components
  • Dark/Light Mode: Adaptive theming for better user experience

πŸ”§ Developer Features

  • RESTful API: Comprehensive API with interactive documentation
  • Data Export: JSON/CSV export functionality for research datasets
  • Extensible Architecture: Modular design for easy feature additions
  • Docker Support: Containerized deployment options

πŸš€ Quick Start

Prerequisites

  • Python 3.8+ (backend)
  • Node.js 16+ and npm (frontend)
  • Git for version control

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/YA-PapersWithCode.git
    cd YA-PapersWithCode
  2. Start the backend (Terminal 1)

    ./start_backend.sh

    This will:

    • Install Python dependencies using uv
    • Download PapersWithCode data (first run only)
    • Initialize SQLite database
    • Start API server on http://localhost:8000
  3. Start the frontend (Terminal 2)

    ./start_frontend.sh

    This will:

  4. Access the application

πŸ“ Project Structure

YA-PapersWithCode/
β”œβ”€β”€ frontend/ya-paperswithcode/     # React frontend application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/             # UI components
β”‚   β”‚   β”‚   β”œβ”€β”€ layout/            # Header, Footer, Layout
β”‚   β”‚   β”‚   β”œβ”€β”€ papers/            # Paper-related components
β”‚   β”‚   β”‚   β”œβ”€β”€ datasets/          # Dataset components
β”‚   β”‚   β”‚   β”œβ”€β”€ methods/           # Method components
β”‚   β”‚   β”‚   β”œβ”€β”€ search/            # Search components
β”‚   β”‚   β”‚   └── ui/                # shadcn/ui base components
β”‚   β”‚   β”œβ”€β”€ pages/                 # Page components
β”‚   β”‚   β”œβ”€β”€ lib/                   # Utilities and API client
β”‚   β”‚   └── types/                 # TypeScript definitions
β”‚   └── package.json               # Frontend dependencies
β”‚
β”œβ”€β”€ data/ya-paperswithcode-database/ # Backend API and database
β”‚   β”œβ”€β”€ agent-search/              # AI search agents
β”‚   β”‚   β”œβ”€β”€ paper_search.py        # Paper search agent
β”‚   β”‚   β”œβ”€β”€ dataset_search.py      # Dataset search agent
β”‚   β”‚   β”œβ”€β”€ manager.py             # Search orchestration
β”‚   β”‚   └── config.json            # Agent configuration
β”‚   β”œβ”€β”€ api_server.py              # FastAPI server
β”‚   β”œβ”€β”€ models.py                  # Database models
β”‚   β”œβ”€β”€ semantic_search.py         # Search functionality
β”‚   └── schema.sql                 # Database schema
β”‚
β”œβ”€β”€ data/download_scripts/         # Data acquisition
β”‚   β”œβ”€β”€ download.py                # Data downloader
β”‚   └── README.md                  # Download documentation
β”‚
β”œβ”€β”€ start_backend.sh               # Backend startup script
β”œβ”€β”€ start_frontend.sh              # Frontend startup script
└── requirements.txt               # Python dependencies

πŸ”Œ API Documentation

The backend provides a comprehensive RESTful API with the following endpoints:

Papers

  • GET /api/v1/papers - List papers with pagination
  • GET /api/v1/papers/{paper_id} - Get specific paper
  • GET /api/v1/papers/arxiv/{arxiv_id} - Get paper by arXiv ID
  • GET /api/v1/search?q={query} - Full-text search

Datasets & Methods

  • GET /api/v1/datasets - List datasets with filtering
  • GET /api/v1/methods - List ML methods
  • GET /api/v1/tasks - List research tasks
  • GET /api/v1/evaluations - Get benchmark results

AI Search

  • POST /api/v1/ai-search - Advanced AI-powered search
  • POST /api/v1/agent-search - Multi-agent search orchestration

Utilities

  • GET /api/v1/statistics - Database statistics
  • GET /api/v1/export/{format} - Data export (JSON/CSV)

Visit http://localhost:8000/docs for interactive API documentation.

🧠 AI Search Agents

The project includes sophisticated AI search agents that provide intelligent research discovery:

Paper Search Agent

  • Multi-layer Expansion: Discovers related papers through citation networks
  • Semantic Understanding: Goes beyond keyword matching
  • Research Graph Traversal: Explores connections between papers
  • Quality Ranking: Prioritizes high-impact research

Dataset Search Agent

  • Task-aware Search: Understands research task requirements
  • Modality Matching: Finds datasets by data type (text, image, audio, etc.)
  • Domain-specific Filtering: Specialized search for different ML domains
  • Compatibility Assessment: Evaluates dataset suitability for specific tasks

Search Manager

  • Agent Orchestration: Coordinates multiple search agents
  • Query Understanding: Automatically selects appropriate search strategies
  • Result Fusion: Combines results from multiple agents
  • Performance Optimization: Balances accuracy and response time

πŸ› οΈ Technology Stack

Frontend

  • React 18 with TypeScript
  • Vite for build tooling
  • Tailwind CSS for styling
  • shadcn/ui component library
  • React Router for navigation
  • Axios for API communication
  • React Query for data fetching

Backend

  • FastAPI web framework
  • SQLite database with FTS
  • Pydantic for data validation
  • Uvicorn ASGI server
  • Asyncio for concurrent operations

Data & Search

  • Full-text Search with SQLite FTS
  • Semantic Search capabilities
  • AI Search Agents for intelligent discovery
  • Data Pipeline for automated updates

πŸ“Š Data Sources

The project uses data from the official PapersWithCode Data Repository:

  • papers-with-abstracts.json.gz - 50,000+ research papers
  • links-between-papers-and-code.json.gz - Code implementation links
  • evaluation-tables.json.gz - Benchmark results
  • methods.json.gz - ML methods and techniques
  • datasets.json.gz - Research datasets catalog

Data is automatically downloaded and processed during the initial setup.

🚒 Deployment

cp .env.template .env

# Update .env manually

# Build backend
./start_backend.sh
# Build frontend
./start_frontend.sh

πŸ“ TODO

  • Implement AI agent for paper search.
  • Provide modular agent search, allowing users to design their own search strategies.
  • Provide Docker deployment.
  • Fetch images for datasets.
  • Integrate Hugging Face daily papers to the frontend (jump to Hugging Face link).
  • Provide SOTA API, find SOTAs for datasets (next version, just like paperswithcode in sota-extractor).
  • Use NocoDB + PostgreSQL to replace SQLite.
  • Improve agent search ability.
  • Use Redis to improve performance.

🀝 Contributing

We welcome contributions to improve YA-PapersWithCode! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Commit your changes: git commit -m 'Add amazing feature'
  5. Push to the branch: git push origin feature/amazing-feature
  6. Open a Pull Request

Development Guidelines

  • Follow TypeScript best practices
  • Write comprehensive tests
  • Update documentation for new features
  • Ensure responsive design for frontend changes
  • Follow REST API conventions for backend changes

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

The data from PapersWithCode is available under the Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0).

πŸ™ Acknowledgments

  • Papers With Code for their invaluable data and platform.
  • shadcn/ui for the beautiful and functional UI components.
  • Pasa by ByteDance for insights into advanced search architecture.
  • cline for inspiration in developer tooling.
  • The ML research community for their continuous contributions to open science.

πŸ“ž Support

If you encounter any issues or have questions:

  1. Check the Issues page
  2. Review the Setup Guide for troubleshooting
  3. Create a new issue with detailed information

About

Yet Another Papers With Code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •