Yet Another Papers With Code - A modern recreation of the Papers With Code platform
Important
π Seeking Compute Sponsors for AI Agent Search
The advanced AI Agent Search feature requires significant computational resources. We are actively seeking compute sponsors to enable this feature on the public web version.
Current Status:
- π Web Version: Agent Search is temporarily unavailable due to compute limitations
- π» Self-Hosted: Full Agent Search functionality is available for local deployment
- π€ Sponsorship Needed: GPU compute resources or cloud credits to enable public access
If you or your organization can provide compute resources, please open an issue with the label compute-sponsorship. Your support will enable free AI-powered search for the entire research community.
Note: Users can deploy their own instance with full Agent Search capabilities by following the deployment guide.
A comprehensive machine learning research platform that provides access to academic papers, datasets, methods, and state-of-the-art benchmarks. This project aims to restore and enhance the functionality of the original Papers With Code website using modern web technologies.
YA-PapersWithCode is a full-stack application that recreates the popular Papers With Code platform, which provided a valuable service to the ML research community before its discontinuation. The project consists of:
- Modern React Frontend - Built with TypeScript, Tailwind CSS, and shadcn/ui components
- FastAPI Backend - Powered by SQLite database with full-text search capabilities
- AI-Powered Search - Advanced search agents for intelligent paper and dataset discovery
- Data Pipeline - Automated downloading and processing of research data
- Multi-modal Search: Papers, datasets, methods, and benchmarks
- AI-Powered Agents: Advanced search expansion using semantic understanding
- Full-text Search: SQLite-based search with optimization for research content
- Smart Filtering: Dynamic filters for modalities, tasks, languages, and more
- Papers: 50,000+ research papers with abstracts and code links
- Datasets: Comprehensive dataset catalog with 39 modalities and 500+ tasks
- Methods: Organized ML methods across domains (CV, NLP, RL, Audio)
- Benchmarks: State-of-the-art leaderboards and evaluation results
- Responsive Design: Mobile-first approach with adaptive layouts
- Real-time Filtering: Instant search results with dynamic filters
- Beautiful UI: Modern design inspired by shadcn/ui components
- Dark/Light Mode: Adaptive theming for better user experience
- RESTful API: Comprehensive API with interactive documentation
- Data Export: JSON/CSV export functionality for research datasets
- Extensible Architecture: Modular design for easy feature additions
- Docker Support: Containerized deployment options
- Python 3.8+ (backend)
- Node.js 16+ and npm (frontend)
- Git for version control
-
Clone the repository
git clone https://github.com/yourusername/YA-PapersWithCode.git cd YA-PapersWithCode -
Start the backend (Terminal 1)
./start_backend.sh
This will:
- Install Python dependencies using
uv - Download PapersWithCode data (first run only)
- Initialize SQLite database
- Start API server on http://localhost:8000
- Install Python dependencies using
-
Start the frontend (Terminal 2)
./start_frontend.sh
This will:
- Install npm dependencies
- Start React dev server on http://localhost:5173
-
Access the application
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
YA-PapersWithCode/
βββ frontend/ya-paperswithcode/ # React frontend application
β βββ src/
β β βββ components/ # UI components
β β β βββ layout/ # Header, Footer, Layout
β β β βββ papers/ # Paper-related components
β β β βββ datasets/ # Dataset components
β β β βββ methods/ # Method components
β β β βββ search/ # Search components
β β β βββ ui/ # shadcn/ui base components
β β βββ pages/ # Page components
β β βββ lib/ # Utilities and API client
β β βββ types/ # TypeScript definitions
β βββ package.json # Frontend dependencies
β
βββ data/ya-paperswithcode-database/ # Backend API and database
β βββ agent-search/ # AI search agents
β β βββ paper_search.py # Paper search agent
β β βββ dataset_search.py # Dataset search agent
β β βββ manager.py # Search orchestration
β β βββ config.json # Agent configuration
β βββ api_server.py # FastAPI server
β βββ models.py # Database models
β βββ semantic_search.py # Search functionality
β βββ schema.sql # Database schema
β
βββ data/download_scripts/ # Data acquisition
β βββ download.py # Data downloader
β βββ README.md # Download documentation
β
βββ start_backend.sh # Backend startup script
βββ start_frontend.sh # Frontend startup script
βββ requirements.txt # Python dependencies
The backend provides a comprehensive RESTful API with the following endpoints:
GET /api/v1/papers- List papers with paginationGET /api/v1/papers/{paper_id}- Get specific paperGET /api/v1/papers/arxiv/{arxiv_id}- Get paper by arXiv IDGET /api/v1/search?q={query}- Full-text search
GET /api/v1/datasets- List datasets with filteringGET /api/v1/methods- List ML methodsGET /api/v1/tasks- List research tasksGET /api/v1/evaluations- Get benchmark results
POST /api/v1/ai-search- Advanced AI-powered searchPOST /api/v1/agent-search- Multi-agent search orchestration
GET /api/v1/statistics- Database statisticsGET /api/v1/export/{format}- Data export (JSON/CSV)
Visit http://localhost:8000/docs for interactive API documentation.
The project includes sophisticated AI search agents that provide intelligent research discovery:
- Multi-layer Expansion: Discovers related papers through citation networks
- Semantic Understanding: Goes beyond keyword matching
- Research Graph Traversal: Explores connections between papers
- Quality Ranking: Prioritizes high-impact research
- Task-aware Search: Understands research task requirements
- Modality Matching: Finds datasets by data type (text, image, audio, etc.)
- Domain-specific Filtering: Specialized search for different ML domains
- Compatibility Assessment: Evaluates dataset suitability for specific tasks
- Agent Orchestration: Coordinates multiple search agents
- Query Understanding: Automatically selects appropriate search strategies
- Result Fusion: Combines results from multiple agents
- Performance Optimization: Balances accuracy and response time
- React 18 with TypeScript
- Vite for build tooling
- Tailwind CSS for styling
- shadcn/ui component library
- React Router for navigation
- Axios for API communication
- React Query for data fetching
- FastAPI web framework
- SQLite database with FTS
- Pydantic for data validation
- Uvicorn ASGI server
- Asyncio for concurrent operations
- Full-text Search with SQLite FTS
- Semantic Search capabilities
- AI Search Agents for intelligent discovery
- Data Pipeline for automated updates
The project uses data from the official PapersWithCode Data Repository:
- papers-with-abstracts.json.gz - 50,000+ research papers
- links-between-papers-and-code.json.gz - Code implementation links
- evaluation-tables.json.gz - Benchmark results
- methods.json.gz - ML methods and techniques
- datasets.json.gz - Research datasets catalog
Data is automatically downloaded and processed during the initial setup.
cp .env.template .env
# Update .env manually
# Build backend
./start_backend.sh
# Build frontend
./start_frontend.sh- Implement AI agent for paper search.
- Provide modular agent search, allowing users to design their own search strategies.
- Provide Docker deployment.
- Fetch images for datasets.
- Integrate Hugging Face daily papers to the frontend (jump to Hugging Face link).
- Provide SOTA API, find SOTAs for datasets (next version, just like paperswithcode in sota-extractor).
- Use NocoDB + PostgreSQL to replace SQLite.
- Improve agent search ability.
- Use Redis to improve performance.
We welcome contributions to improve YA-PapersWithCode! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow TypeScript best practices
- Write comprehensive tests
- Update documentation for new features
- Ensure responsive design for frontend changes
- Follow REST API conventions for backend changes
This project is licensed under the MIT License - see the LICENSE file for details.
The data from PapersWithCode is available under the Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0).
- Papers With Code for their invaluable data and platform.
- shadcn/ui for the beautiful and functional UI components.
- Pasa by ByteDance for insights into advanced search architecture.
- cline for inspiration in developer tooling.
- The ML research community for their continuous contributions to open science.
If you encounter any issues or have questions:
- Check the Issues page
- Review the Setup Guide for troubleshooting
- Create a new issue with detailed information