chore(docs): Highlight the usage of query to filter data in Document Loader. (#252)

dishaprakash · averikitsch · web-flow · commit 78c0ce012340 · 2025-01-16T13:43:30.000-08:00
* docs: Highlight the usage of query to filter data in Document Loader.

* remove blank block

* Minor change

---------

Co-authored-by: Averi Kitsch &lt;akitsch@google.com&gt;
diff --git a/docs/document_loader.ipynb b/docs/document_loader.ipynb
@@ -229,34 +229,6 @@
         ")"
       ]
     },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Create a table (if not already exists)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from langchain_google_cloud_sql_pg import Column\n",
-        "\n",
-        "await engine.ainit_document_table(\n",
-        "    table_name=TABLE_NAME,\n",
-        "    content_column=\"product_name\",\n",
-        "    metadata_columns=[\n",
-        "        Column(\"id\", \"SERIAL\", nullable=False),\n",
-        "        Column(\"content\", \"VARCHAR\", nullable=False),\n",
-        "        Column(\"description\", \"VARCHAR\", nullable=False),\n",
-        "    ],\n",
-        "    metadata_json_column=\"metadata\",\n",
-        "    store_metadata=True,\n",
-        ")"
-      ]
-    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -286,21 +258,20 @@
       ]
     },
     {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "z-AZyzAQ7bsf"
-      },
-      "outputs": [],
+      "cell_type": "markdown",
+      "metadata": {},
       "source": [
-        "from langchain_google_cloud_sql_pg import PostgresLoader\n",
-        "\n",
-        "# Creating a basic PostgreSQL object\n",
-        "loader = await PostgresLoader.create(\n",
-        "    engine,\n",
-        "    table_name=TABLE_NAME,\n",
-        "    # schema_name=SCHEMA_NAME,\n",
-        ")"
+        "When creating an `PostgresLoader` for fetching data from Cloud SQL PG, you have two main options to specify the data you want to load:\n",
+        "* using the table_name argument - When you specify the table_name argument, you're telling the loader to fetch all the data from the given table.\n",
+        "* using the query argument - When you specify the query argument, you can provide a custom SQL query to fetch the data. This allows you to have full control over the SQL query, including selecting specific columns, applying filters, sorting, joining tables, etc.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Load Documents using the `table_name` argument"
       ]
     },
     {
@@ -309,7 +280,7 @@
         "id": "PeOMpftjc9_e"
       },
       "source": [
-        "### Load Documents via default table\n",
+        "#### Load Documents via default table\n",
         "The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n",
         "page_content and the second column as metadata (JSON). Each row becomes a document. \n",
         "\n",
@@ -343,7 +314,7 @@
         "id": "kSkL9l1Hc9_e"
       },
       "source": [
-        "### Load documents via custom table/metadata or custom page content columns"
+        "#### Load documents via custom table/metadata or custom page content columns"
       ]
     },
     {
@@ -363,6 +334,42 @@
         "print(docs)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Load Documents using a SQL query\n",
+        "The query parameter allows users to specify a custom SQL query which can include filters to load specific documents from a database."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "table_name = \"products\"\n",
+        "content_columns = [\"product_name\", \"description\"]\n",
+        "metadata_columns = [\"id\", \"content\"]\n",
+        "\n",
+        "loader = PostgresLoader.create(\n",
+        "    engine=engine,\n",
+        "    query=f\"SELECT * FROM {table_name};\",\n",
+        "    content_columns=content_columns,\n",
+        "    metadata_columns=metadata_columns,\n",
+        ")\n",
+        "\n",
+        "docs = await loader.aload()\n",
+        "print(docs)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "**Note**: If the `content_columns` and `metadata_columns` are not specified, the loader will automatically treat the first returned column as the document’s `page_content` and all subsequent columns as `metadata`."
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -396,10 +403,45 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "### Create PostgresSaver\n",
+        "## Create PostgresSaver\n",
         "The `PostgresSaver` allows for saving of pre-processed documents to the table using the first column as page_content and all other columns as metadata. This table can easily be loaded via a Document Loader or updated to be a VectorStore. The default table will have the first column as page_content and the second column as metadata (JSON)."
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Create a table (if not already exists)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from langchain_google_cloud_sql_pg import Column\n",
+        "\n",
+        "await engine.ainit_document_table(\n",
+        "    table_name=TABLE_NAME,\n",
+        "    content_column=\"product_name\",\n",
+        "    metadata_columns=[\n",
+        "        Column(\"id\", \"SERIAL\", nullable=False),\n",
+        "        Column(\"content\", \"VARCHAR\", nullable=False),\n",
+        "        Column(\"description\", \"VARCHAR\", nullable=False),\n",
+        "    ],\n",
+        "    metadata_json_column=\"metadata\",\n",
+        "    store_metadata=True,\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Create PostgresSaver"
+      ]
+    },
     {
       "cell_type": "code",
       "execution_count": null,
diff --git a/docs/vector_store.ipynb b/docs/vector_store.ipynb
@@ -586,6 +586,37 @@
     "\n",
     "print(docs)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Search for documents without Vector Store.\n",
+    "You may want to search documents based on Document metadata as a tool or as a part of an exploratory workflow. The Document Loader can be used to customize the search and load data in the form of Documents from your database. Learn how to ['Load Documents using a SQL query'](https://github.com/googleapis/langchain-google-cloud-sql-pg-python/blob/main/docs/document_loader.ipynb)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_google_cloud_sql_pg import PostgresLoader\n",
+    "\n",
+    "table_name = \"products\"\n",
+    "content_columns = [\"product_name\", \"description\"]\n",
+    "metadata_columns = [\"id\", \"content\"]\n",
+    "\n",
+    "loader = PostgresLoader.create(\n",
+    "    engine=engine,\n",
+    "    query=f\"SELECT * FROM {table_name};\",\n",
+    "    content_columns=content_columns,\n",
+    "    metadata_columns=metadata_columns,\n",
+    ")\n",
+    "\n",
+    "docs = await loader.aload()\n",
+    "print(docs)"
+   ]
   }
  ],
  "metadata": {