Skip to content

Conversation

Copy link

Copilot AI commented Nov 17, 2025

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original prompt

Create a new branch called add/fastapi-ui and open a pull request that adds a minimal Dockerized FastAPI service and a tiny UI that invokes the existing scraper function by Python function name. Add the following files with the exact contents below (do not modify existing files):

Files and contents (paths relative to repo root):

  1. Dockerfile
FROM python:3.11-slim

WORKDIR /app

# copy project files
COPY . /app

# install API deps + repo deps if present
RUN pip install --no-cache-dir -r requirements-api.txt \
    && if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi

ENV PYTHONUNBUFFERED=1

EXPOSE 8000

CMD ["uvicorn", "src.api.app:app", "--host", "0.0.0.0", "--port", "8000", "--proxy-headers"]
  1. docker-compose.yml
version: "3.8"
services:
  web:
    build: .
    ports:
      - "8000:8000"
    environment:
      - SCRAPER_FUNCTION=${SCRAPER_FUNCTION:-src.scraper.threads_scraper:scrape_url}
      - API_KEY=${API_KEY:-}
    volumes:
      - ./:/app
  1. requirements-api.txt
fastapi==0.105.0
uvicorn[standard]==0.23.1
python-multipart==0.0.6
  1. src/api/app.py
import os
import json
import uuid
import time
import asyncio
import importlib
import importlib.util
from typing import Any
from fastapi import FastAPI, BackgroundTasks, HTTPException, Request
from fastapi.responses import JSONResponse, RedirectResponse, FileResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel

APP_DIR = os.getcwd()
RESULTS_DIR = os.path.join(APP_DIR, "output", "results")
os.makedirs(RESULTS_DIR, exist_ok=True)

SCRAPER_FUNCTION = os.getenv("SCRAPER_FUNCTION", "src.scraper.threads_scraper:scrape_url")
API_KEY = os.getenv("API_KEY", "").strip()

app = FastAPI(title="Threads Scraper API (minimal)")

# serve the UI folder at /ui
ui_path = os.path.join(APP_DIR, "web", "ui")
if os.path.isdir(ui_path):
    app.mount("/ui", StaticFiles(directory=ui_path), name="ui")

jobs = {}  # job_id -> { status, created_at, updated_at, result_path, error }

class ScrapeRequest(BaseModel):
    url: str

def load_function(module_name: str, func_name: str):
    """
    Try to import module.func. If module isn't a package importable by name,
    fall back to loading by file path relative to repo root.
    """
    try:
        module = importlib.import_module(module_name)
        return getattr(module, func_name)
    except Exception:
        # try loading from file path
        try:
            path = module_name.replace(".", os.sep) + ".py"
            filepath = os.path.join(APP_DIR, path)
            if not os.path.exists(filepath):
                raise FileNotFoundError(f"Module file not found: {filepath}")
            spec = importlib.util.spec_from_file_location(module_name, filepath)
            module = importlib.util.module_from_spec(spec)
            spec.loader.exec_module(module)
            return getattr(module, func_name)
        except Exception as e:
            raise ImportError(f"Could not load function {module_name}:{func_name} ({e})")

def save_result(job_id: str, payload: Any):
    out_path = os.path.join(RESULTS_DIR, f"{job_id}.json")
    with open(out_path, "w", encoding="utf-8") as f:
        json.dump(payload, f, ensure_ascii=False, indent=2)
    return out_path

def run_scraper_sync(func, url: str):
    """
    Call the scraper function. Handle both sync and async functions.
    """
    if asyncio.iscoroutinefunction(func):
        return asyncio.run(func(url))
    else:
        return func(url)

def process_job(job_id: str, url: str):
    jobs[job_id]["status"] = "running"
    jobs[job_id]["updated_at"] = time.time()
    module_part, func_part = SCRAPER_FUNCTION.split(":") if ":" in SCRAPER_FUNCTION else (SCRAPER_FUNCTION, "scrape_url")
    try:
        func = load_function(module_part, func_part)
    except Exception as e:
        jobs[job_id]["status"] = "error"
        jobs[job_id]["error"] = f"Import error: {e}"
        jobs[job_id]["updated_at"] = time.time()
        return

    try:
        result = run_scraper_sync(func, url)
        # ensure it's JSON-serializable (best effort)
        save_result(job_id, {"url": url, "result": result, "fetched_at": time.time()})
        jobs[job_id]["status"] = "done"
        jobs[job_id]["result_path"] = os.path.relpath(os.path.join(RESULTS_DIR, f"{job_id}.json"), APP_DIR)
    except Exception as e:
        jobs[job_id]["status"] = "error"
        jobs[job_id]["error"] = f"Scrape error: {e}"
    finally:
        jobs[job_id]["updated_at"] = time.time()

@app.get("/", include_in_schema=False)
def root():
    # redirect to UI if available
    if os.path.isdir(ui_path):
        return RedirectResponse(url="/ui/")
    return JSONResponse({"ok": True, "info": "Threads Scraper API. Visit /ui for the UI (if present)."})

@app.post("/api/scrape")
def scrape(req: ScrapeRequest, background_tasks: BackgroundTasks, request: Request):
    if API_KEY:
        provided = request.headers.get("x-api-key", "")
  ...

</details>

*This pull request was created as a result of the following prompt from Copilot chat.*
> Create a new branch called add/fastapi-ui and open a pull request that adds a minimal Dockerized FastAPI service and a tiny UI that invokes the existing scraper function by Python function name. Add the following files with the exact contents below (do not modify existing files):
> 
> Files and contents (paths relative to repo root):
> 
> 1) Dockerfile
> ```
> FROM python:3.11-slim
> 
> WORKDIR /app
> 
> # copy project files
> COPY . /app
> 
> # install API deps + repo deps if present
> RUN pip install --no-cache-dir -r requirements-api.txt \
>     && if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
> 
> ENV PYTHONUNBUFFERED=1
> 
> EXPOSE 8000
> 
> CMD ["uvicorn", "src.api.app:app", "--host", "0.0.0.0", "--port", "8000", "--proxy-headers"]
> ```
> 
> 2) docker-compose.yml
> ```
> version: "3.8"
> services:
>   web:
>     build: .
>     ports:
>       - "8000:8000"
>     environment:
>       - SCRAPER_FUNCTION=${SCRAPER_FUNCTION:-src.scraper.threads_scraper:scrape_url}
>       - API_KEY=${API_KEY:-}
>     volumes:
>       - ./:/app
> ```
> 
> 3) requirements-api.txt
> ```
> fastapi==0.105.0
> uvicorn[standard]==0.23.1
> python-multipart==0.0.6
> ```
> 
> 4) src/api/app.py
> ```
> import os
> import json
> import uuid
> import time
> import asyncio
> import importlib
> import importlib.util
> from typing import Any
> from fastapi import FastAPI, BackgroundTasks, HTTPException, Request
> from fastapi.responses import JSONResponse, RedirectResponse, FileResponse
> from fastapi.staticfiles import StaticFiles
> from pydantic import BaseModel
> 
> APP_DIR = os.getcwd()
> RESULTS_DIR = os.path.join(APP_DIR, "output", "results")
> os.makedirs(RESULTS_DIR, exist_ok=True)
> 
> SCRAPER_FUNCTION = os.getenv("SCRAPER_FUNCTION", "src.scraper.threads_scraper:scrape_url")
> API_KEY = os.getenv("API_KEY", "").strip()
> 
> app = FastAPI(title="Threads Scraper API (minimal)")
> 
> # serve the UI folder at /ui
> ui_path = os.path.join(APP_DIR, "web", "ui")
> if os.path.isdir(ui_path):
>     app.mount("/ui", StaticFiles(directory=ui_path), name="ui")
> 
> jobs = {}  # job_id -> { status, created_at, updated_at, result_path, error }
> 
> class ScrapeRequest(BaseModel):
>     url: str
> 
> def load_function(module_name: str, func_name: str):
>     """
>     Try to import module.func. If module isn't a package importable by name,
>     fall back to loading by file path relative to repo root.
>     """
>     try:
>         module = importlib.import_module(module_name)
>         return getattr(module, func_name)
>     except Exception:
>         # try loading from file path
>         try:
>             path = module_name.replace(".", os.sep) + ".py"
>             filepath = os.path.join(APP_DIR, path)
>             if not os.path.exists(filepath):
>                 raise FileNotFoundError(f"Module file not found: {filepath}")
>             spec = importlib.util.spec_from_file_location(module_name, filepath)
>             module = importlib.util.module_from_spec(spec)
>             spec.loader.exec_module(module)
>             return getattr(module, func_name)
>         except Exception as e:
>             raise ImportError(f"Could not load function {module_name}:{func_name} ({e})")
> 
> def save_result(job_id: str, payload: Any):
>     out_path = os.path.join(RESULTS_DIR, f"{job_id}.json")
>     with open(out_path, "w", encoding="utf-8") as f:
>         json.dump(payload, f, ensure_ascii=False, indent=2)
>     return out_path
> 
> def run_scraper_sync(func, url: str):
>     """
>     Call the scraper function. Handle both sync and async functions.
>     """
>     if asyncio.iscoroutinefunction(func):
>         return asyncio.run(func(url))
>     else:
>         return func(url)
> 
> def process_job(job_id: str, url: str):
>     jobs[job_id]["status"] = "running"
>     jobs[job_id]["updated_at"] = time.time()
>     module_part, func_part = SCRAPER_FUNCTION.split(":") if ":" in SCRAPER_FUNCTION else (SCRAPER_FUNCTION, "scrape_url")
>     try:
>         func = load_function(module_part, func_part)
>     except Exception as e:
>         jobs[job_id]["status"] = "error"
>         jobs[job_id]["error"] = f"Import error: {e}"
>         jobs[job_id]["updated_at"] = time.time()
>         return
> 
>     try:
>         result = run_scraper_sync(func, url)
>         # ensure it's JSON-serializable (best effort)
>         save_result(job_id, {"url": url, "result": result, "fetched_at": time.time()})
>         jobs[job_id]["status"] = "done"
>         jobs[job_id]["result_path"] = os.path.relpath(os.path.join(RESULTS_DIR, f"{job_id}.json"), APP_DIR)
>     except Exception as e:
>         jobs[job_id]["status"] = "error"
>         jobs[job_id]["error"] = f"Scrape error: {e}"
>     finally:
>         jobs[job_id]["updated_at"] = time.time()
> 
> @app.get("/", include_in_schema=False)
> def root():
>     # redirect to UI if available
>     if os.path.isdir(ui_path):
>         return RedirectResponse(url="/ui/")
>     return JSONResponse({"ok": True, "info": "Threads Scraper API. Visit /ui for the UI (if present)."})
> 
> @app.post("/api/scrape")
> def scrape(req: ScrapeRequest, background_tasks: BackgroundTasks, request: Request):
>     if API_KEY:
>         provided = request.headers.get("x-api-key", "")
>         if provided != API_KEY:
>             raise HTTPException(status_code=401, detail="invalid API key")
>     url = req.url.strip()
>     if not url:
>         raise HTTPException(status_code=400, detail="url is required")
>     job_id = uuid.uuid4().hex
>     jobs[job_id] = {
>         "status": "queued",
>         "created_at": time.time(),
>         "updated_at": time.time(),
>         "url": url
>     }
>     # launch background job
>     background_tasks.add_task(process_job, job_id, url)
>     return {"job_id": job_id, "status": "queued"}
> 
> @app.get("/api/result/{job_id}")
> def get_result(job_id: str):
>     meta = jobs.get(job_id)
>     if not meta:
>         raise HTTPException(status_code=404, detail="job not found")
>     out = {"job_id": job_id, "status": meta.get("status"), "created_at": meta.get("created_at"), "updated_at": meta.get("updated_at")}
>     if meta.get("status") == "done" and meta.get("result_path"):
>         try:
>             p = os.path.join(APP_DIR, meta["result_path"])
>             with open(p, "r", encoding="utf-8") as f:
>                 out["result"] = json.load(f)
>         except Exception as e:
>             out["result_error"] = f"Failed to read result file: {e}"
>     if meta.get("status") == "error":
>         out["error"] = meta.get("error")
>     return out
> 
> @app.get("/api/raw-result/{job_id}")
> def raw_result(job_id: str):
>     # Returns raw JSON file if present
>     meta = jobs.get(job_id)
>     if not meta:
>         raise HTTPException(status_code=404, detail="job not found")
>     if meta.get("result_path"):
>         fp = os.path.join(APP_DIR, meta["result_path"])
>         if os.path.exists(fp):
>             return FileResponse(fp, media_type="application/json", filename=f"{job_id}.json")
>     raise HTTPException(status_code=404, detail="result not found")
> ```
> 
> 5) web/ui/index.html
> ```
> <!doctype html>
> <html lang="en">
> <head>
>   <meta charset="utf-8"/>
>   <meta name="viewport" content="width=device-width,initial-scale=1"/>
>   <title>Threads Scraper — UI</title>
>   <style>
>     body { font-family: Arial, Helvetica, sans-serif; padding: 24px; max-width: 900px; margin: auto; }
>     input, button { padding: 8px; font-size: 16px; }
>     #result { white-space: pre-wrap; background:#f7f7f7; padding:12px; margin-top:12px; border-radius:6px; }
>     label { display:block; margin-bottom:6px; font-weight:600; }
>   </style>
> </head>
> <body>
>   <h2>Threads Scraper — simple UI</h2>
>   <p>Enter a public Threads URL to scrape. Results will appear below when ready.</p>
> 
>   <div>
>     <label>Threads URL</label>
>     <input id="url" type="text" placeholder="https://www.threads.net/..." style="width:70%"/>
>     <label style="display:inline-block; margin-left:12px;">API Key (optional)</label>
>     <input id="apikey" type="text" placeholder="x-api-key (if required)" style="width:20%"/>
>     <button id="go">Scrape</button>
>   </div>
> 
>   <div id="status" style="margin-top:12px;"></div>
>   <pre id="result"></pre>
> 
> <script>
> const go = document.getElementById('go');
> const urlEl = document.getElementById('url');
> const statusEl = document.getElementById('status');
> const resultEl = document.getElementById('result');
> const keyEl = document.getElementById('apikey');
> 
> async function submit() {
>   resultEl.textContent = '';
>   statusEl.textContent = 'Submitting job...';
>   const url = urlEl.value.trim();
>   if (!url) { statusEl.textContent = 'Enter a URL'; return; }
>   const headers = {'Content-Type': 'application/json'};
>   const key = keyEl.value.trim();
>   if (key) headers['x-api-key'] = key;
>   try {
>     const res = await fetch('/api/scrape', { method: 'POST', headers, body: JSON.stringify({url}) });
>     if (!res.ok) {
>       const err = await res.json();
>       statusEl.textContent = 'Error: ' + (err.detail || res.statusText);
>       return;
>     }
>     const data = await res.json();
>     const job = data.job_id;
>     statusEl.textContent = 'Job queued: ' + job + '. Polling...';
>     poll(job, headers);
>   } catch (e) {
>     statusEl.textContent = 'Network error: ' + e;
>   }
> }
> 
> async function poll(job, headers) {
>   const url = '/api/result/' + job;
>   try {
>     const res = await fetch(url, { headers });
>     if (!res.ok) {
>       const j = await res.json().catch(()=>({}));
>       statusEl.textContent = 'Error: ' + (j.detail || res.statusText);
>       return;
>     }
>     const data = await res.json();
>     statusEl.textContent = 'Status: ' + data.status;
>     if (data.status === 'done') {
>       resultEl.textContent = JSON.stringify(data.result, null, 2);
>       return;
>     } else if (data.status === 'error') {
>       resultEl.textContent = 'Error: ' + (data.error || 'unknown');
>       return;
>     }
>     setTimeout(()=>poll(job, headers), 2000);
>   } catch (e) {
>     statusEl.textContent = 'Polling error: ' + e;
>   }
> }
> 
> go.addEventListener('click', submit);
> </script>
> </body>
> </html>
> ```
> 
> 6) README-run.md
> ```
> Quick run (Docker + docker-compose)
> 
> 1) Optional: set environment variables in your shell or a .env file:
>    - API_KEY (optional) — if set, include header x-api-key on POST requests / from UI
>    - SCRAPER_FUNCTION (optional) — Python import path for your scraper function. Default:
>        src.scraper.threads_scraper:scrape_url
>      The function should accept a single argument (url: str) and return a JSON-serializable dict.
> 
> 2) Build and run:
>    docker-compose up --build
> 
> 3) UI:
>    Visit http://localhost:8000/ui/ to submit a URL and view results.
> 
> 4) API:
>    POST /api/scrape    { "url": "https://threads.net/..." }  => { "job_id": "..." }
>    GET  /api/result/{job_id}  => status + result (when ready)
>    GET  /api/raw-result/{job_id} => raw JSON file download
> 
> Notes:
> - Results are stored under output/results/{job_id}.json
> - This demo uses an in-process background task for simplicity. Replace with Celery/Redis for robust production queueing.
> - If your scraper module isn't a package importable as a module name, the system will try to load the Python file from src/... path.
> ```
> 
> Additional PR metadata:
> - Branch name: add/fastapi-ui
> - Commit message: "Add FastAPI demo API and UI (scraper integration)"
> - PR title: Add FastAPI demo API and simple UI for Threads Scraper
> - PR body: Adds a minimal FastAPI app, static UI, Dockerfile, docker-compose, and run instructions. The API calls the scraper function defined by SCRAPER_FUNCTION (default src.scraper.threads_scraper:scrape_url). Results are saved to output/results.
> 
> Please create the branch, add the files, commit, push, and open a pull request against main with the given title and body. Do not modify existing files. If any errors occur (e.g., merge conflicts or missing directories), report them back in the response.
> 

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

Copilot AI self-assigned this Nov 17, 2025
Copilot stopped work on behalf of headknod-art due to an error November 17, 2025 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant