{% tabs %} {% tab title="Python " %} Python > 3.8.1
pip install pgml{% endtab %}
{% tab title="JavaScript " %}
npm i pgml
{% endtab %} {% endtabs %}
Once the SDK is installed, you an use the following example to get started.
{% tabs %} {% tab title="Python" %}
from pgml import Collection, Model, Splitter, Pipeline
import asyncio
async def main():
# Initialize collection
collection = Collection("sample_collection"){% endtab %}
{% tab title="JavaScript " %}
const pgml = require("pgml");
const main = async () => {
collection = pgml.newCollection("sample_collection");{% endtab %} {% endtabs %}
Explanation:
- The code imports the pgml module.
- It creates an instance of the Collection class which we will add pipelines and documents onto
Continuing with main
{% tabs %} {% tab title="Python" %}
# Create a pipeline using the default model and splitter
model = Model()
splitter = Splitter()
pipeline = Pipeline("sample_pipeline", model, splitter)
await collection.add_pipeline(pipeline){% endtab %}
{% tab title="JavaScript" %}
model = pgml.newModel();
splitter = pgml.newSplitter();
pipeline = pgml.Pipeline("sample_pipeline", model, splitter);
await collection.add_pipeline(pipeline);{% endtab %} {% endtabs %}
- The code creates an instance of
ModelandSplitterusing their default arguments. - Finally, the code constructs a pipeline called
"sample_pipeline"and add it to the collection we Initialized above. This pipeline automatically generates chunks and embeddings for every upserted document.
Continuing with main
{% tabs %} {% tab title="Python" %}
documents = [
{
id: "Document One",
text: "document one contents...",
},
{
id: "Document Two",
text: "document two contents...",
},
];
await collection.upsert_documents(documents);{% endtab %}
{% tab title="JavaScript" %}
const documents = [
{
id: "Document One",
text: "document one contents...",
},
{
id: "Document Two",
text: "document two contents...",
},
];
await collection.upsert_documents(documents);{% endtab %} {% endtabs %}
Explanation
- This code creates and upserts some filler documents.
- As mentioned above, the pipeline added earlier automatically runs and generates chunks and embeddings for each document.
Continuing with main
{% tabs %} {% tab title="Python" %}
# Query
query = "Some user query that will match document one first"
results = await collection.query().vector_recall(query, pipeline).limit(2).fetch_all()
print(results)
# Archive collection
await collection.archive(){% endtab %}
{% tab title="JavaScript" %}
const queryResults = await collection
.query()
.vector_recall("Some user query that will match document one first", pipeline)
.limit(2)
.fetch_all();
// Convert the results to an array of objects
const results = queryResults.map((result) => {
const [similarity, text, metadata] = result;
return {
similarity,
text,
metadata,
};
});
console.log(results);
await collection.archive();{% endtab %} {% endtabs %}
Explanation:
- The
querymethod is called to perform a vector-based search on the collection. The query string isSome user query that will match document one first, and the top 2 results are requested. - The search results are converted to objects and printed.
- Finally, the
archivemethod is called to archive the collection and free up resources in the PostgresML database.
Call main function.
{% tabs %} {% tab title="Python" %}
if __name__ == "__main__":
asyncio.run(main()){% endtab %}
{% tab title="JavaScript" %}
main().then(() => {
console.log("Done with PostgresML demo");
});{% endtab %} {% endtabs %}
Open a terminal or command prompt and navigate to the directory where the file is saved.
Execute the following command:
{% tabs %} {% tab title="Python" %}
python vector_search.py{% endtab %}
{% tab title="JavaScript" %}
node vector_search.js{% endtab %} {% endtabs %}
You should see the search results printed in the terminal. As you can see, our vector search engine did match document one first.
[
{
similarity: 0.8506832955692104,
text: 'document one contents...',
metadata: { id: 'Document One' }
},
{
similarity: 0.8066114609244565,
text: 'document two contents...',
metadata: { id: 'Document Two' }
}
]