Skip to content

Commit 78c0ce0

Browse files
chore(docs): Highlight the usage of query to filter data in Document Loader. (#252)
* docs: Highlight the usage of query to filter data in Document Loader. * remove blank block * Minor change --------- Co-authored-by: Averi Kitsch <akitsch@google.com>
1 parent a8cc5a2 commit 78c0ce0

File tree

2 files changed

+118
-45
lines changed

2 files changed

+118
-45
lines changed

docs/document_loader.ipynb

Lines changed: 87 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -229,34 +229,6 @@
229229
")"
230230
]
231231
},
232-
{
233-
"cell_type": "markdown",
234-
"metadata": {},
235-
"source": [
236-
"### Create a table (if not already exists)"
237-
]
238-
},
239-
{
240-
"cell_type": "code",
241-
"execution_count": null,
242-
"metadata": {},
243-
"outputs": [],
244-
"source": [
245-
"from langchain_google_cloud_sql_pg import Column\n",
246-
"\n",
247-
"await engine.ainit_document_table(\n",
248-
" table_name=TABLE_NAME,\n",
249-
" content_column=\"product_name\",\n",
250-
" metadata_columns=[\n",
251-
" Column(\"id\", \"SERIAL\", nullable=False),\n",
252-
" Column(\"content\", \"VARCHAR\", nullable=False),\n",
253-
" Column(\"description\", \"VARCHAR\", nullable=False),\n",
254-
" ],\n",
255-
" metadata_json_column=\"metadata\",\n",
256-
" store_metadata=True,\n",
257-
")"
258-
]
259-
},
260232
{
261233
"cell_type": "markdown",
262234
"metadata": {},
@@ -286,21 +258,20 @@
286258
]
287259
},
288260
{
289-
"cell_type": "code",
290-
"execution_count": null,
291-
"metadata": {
292-
"id": "z-AZyzAQ7bsf"
293-
},
294-
"outputs": [],
261+
"cell_type": "markdown",
262+
"metadata": {},
295263
"source": [
296-
"from langchain_google_cloud_sql_pg import PostgresLoader\n",
297-
"\n",
298-
"# Creating a basic PostgreSQL object\n",
299-
"loader = await PostgresLoader.create(\n",
300-
" engine,\n",
301-
" table_name=TABLE_NAME,\n",
302-
" # schema_name=SCHEMA_NAME,\n",
303-
")"
264+
"When creating an `PostgresLoader` for fetching data from Cloud SQL PG, you have two main options to specify the data you want to load:\n",
265+
"* using the table_name argument - When you specify the table_name argument, you're telling the loader to fetch all the data from the given table.\n",
266+
"* using the query argument - When you specify the query argument, you can provide a custom SQL query to fetch the data. This allows you to have full control over the SQL query, including selecting specific columns, applying filters, sorting, joining tables, etc.\n",
267+
"\n"
268+
]
269+
},
270+
{
271+
"cell_type": "markdown",
272+
"metadata": {},
273+
"source": [
274+
"### Load Documents using the `table_name` argument"
304275
]
305276
},
306277
{
@@ -309,7 +280,7 @@
309280
"id": "PeOMpftjc9_e"
310281
},
311282
"source": [
312-
"### Load Documents via default table\n",
283+
"#### Load Documents via default table\n",
313284
"The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n",
314285
"page_content and the second column as metadata (JSON). Each row becomes a document. \n",
315286
"\n",
@@ -343,7 +314,7 @@
343314
"id": "kSkL9l1Hc9_e"
344315
},
345316
"source": [
346-
"### Load documents via custom table/metadata or custom page content columns"
317+
"#### Load documents via custom table/metadata or custom page content columns"
347318
]
348319
},
349320
{
@@ -363,6 +334,42 @@
363334
"print(docs)"
364335
]
365336
},
337+
{
338+
"cell_type": "markdown",
339+
"metadata": {},
340+
"source": [
341+
"### Load Documents using a SQL query\n",
342+
"The query parameter allows users to specify a custom SQL query which can include filters to load specific documents from a database."
343+
]
344+
},
345+
{
346+
"cell_type": "code",
347+
"execution_count": null,
348+
"metadata": {},
349+
"outputs": [],
350+
"source": [
351+
"table_name = \"products\"\n",
352+
"content_columns = [\"product_name\", \"description\"]\n",
353+
"metadata_columns = [\"id\", \"content\"]\n",
354+
"\n",
355+
"loader = PostgresLoader.create(\n",
356+
" engine=engine,\n",
357+
" query=f\"SELECT * FROM {table_name};\",\n",
358+
" content_columns=content_columns,\n",
359+
" metadata_columns=metadata_columns,\n",
360+
")\n",
361+
"\n",
362+
"docs = await loader.aload()\n",
363+
"print(docs)"
364+
]
365+
},
366+
{
367+
"cell_type": "markdown",
368+
"metadata": {},
369+
"source": [
370+
"**Note**: If the `content_columns` and `metadata_columns` are not specified, the loader will automatically treat the first returned column as the document’s `page_content` and all subsequent columns as `metadata`."
371+
]
372+
},
366373
{
367374
"cell_type": "markdown",
368375
"metadata": {
@@ -396,10 +403,45 @@
396403
"cell_type": "markdown",
397404
"metadata": {},
398405
"source": [
399-
"### Create PostgresSaver\n",
406+
"## Create PostgresSaver\n",
400407
"The `PostgresSaver` allows for saving of pre-processed documents to the table using the first column as page_content and all other columns as metadata. This table can easily be loaded via a Document Loader or updated to be a VectorStore. The default table will have the first column as page_content and the second column as metadata (JSON)."
401408
]
402409
},
410+
{
411+
"cell_type": "markdown",
412+
"metadata": {},
413+
"source": [
414+
"### Create a table (if not already exists)"
415+
]
416+
},
417+
{
418+
"cell_type": "code",
419+
"execution_count": null,
420+
"metadata": {},
421+
"outputs": [],
422+
"source": [
423+
"from langchain_google_cloud_sql_pg import Column\n",
424+
"\n",
425+
"await engine.ainit_document_table(\n",
426+
" table_name=TABLE_NAME,\n",
427+
" content_column=\"product_name\",\n",
428+
" metadata_columns=[\n",
429+
" Column(\"id\", \"SERIAL\", nullable=False),\n",
430+
" Column(\"content\", \"VARCHAR\", nullable=False),\n",
431+
" Column(\"description\", \"VARCHAR\", nullable=False),\n",
432+
" ],\n",
433+
" metadata_json_column=\"metadata\",\n",
434+
" store_metadata=True,\n",
435+
")"
436+
]
437+
},
438+
{
439+
"cell_type": "markdown",
440+
"metadata": {},
441+
"source": [
442+
"### Create PostgresSaver"
443+
]
444+
},
403445
{
404446
"cell_type": "code",
405447
"execution_count": null,

docs/vector_store.ipynb

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -586,6 +586,37 @@
586586
"\n",
587587
"print(docs)"
588588
]
589+
},
590+
{
591+
"cell_type": "markdown",
592+
"metadata": {},
593+
"source": [
594+
"### Search for documents without Vector Store.\n",
595+
"You may want to search documents based on Document metadata as a tool or as a part of an exploratory workflow. The Document Loader can be used to customize the search and load data in the form of Documents from your database. Learn how to ['Load Documents using a SQL query'](https://github.com/googleapis/langchain-google-cloud-sql-pg-python/blob/main/docs/document_loader.ipynb)\n"
596+
]
597+
},
598+
{
599+
"cell_type": "code",
600+
"execution_count": null,
601+
"metadata": {},
602+
"outputs": [],
603+
"source": [
604+
"from langchain_google_cloud_sql_pg import PostgresLoader\n",
605+
"\n",
606+
"table_name = \"products\"\n",
607+
"content_columns = [\"product_name\", \"description\"]\n",
608+
"metadata_columns = [\"id\", \"content\"]\n",
609+
"\n",
610+
"loader = PostgresLoader.create(\n",
611+
" engine=engine,\n",
612+
" query=f\"SELECT * FROM {table_name};\",\n",
613+
" content_columns=content_columns,\n",
614+
" metadata_columns=metadata_columns,\n",
615+
")\n",
616+
"\n",
617+
"docs = await loader.aload()\n",
618+
"print(docs)"
619+
]
589620
}
590621
],
591622
"metadata": {

0 commit comments

Comments
 (0)