Capability

Give the model your documents.

Language models do not know your contracts, manuals or policies. Vectorization is how Wisej.AI gives them that knowledge: it converts files into vectors that capture meaning, stores them, and at question time retrieves only the passages that are actually relevant. This is the retrieval-augmented generation (RAG) pipeline — built in, not bolted on.

The whole flow runs through a single IngestDocumentAsync call on the SmartHub, backed by four services you can swap independently.

Why retrieval

Grounded answers beat bigger prompts.

Accurate

The model answers from passages retrieved out of your content — and can cite them — instead of improvising from training data.

Economical

Only the relevant chunks travel with the question. You don't pay to send the whole handbook on every request.

Current

Re-ingest a document and the knowledge updates instantly. No fine-tuning cycle, no model release to wait for.

The pipeline

Four swappable stages.

1

Convert

IDocumentConversionService

A PDF, Word file or stream is turned into plain text the rest of the pipeline can read.

2

Split

ITextSplitterService

That text is broken into overlapping chunks small enough to embed and retrieve precisely.

3

Embed

IEmbeddingGenerationService

Each chunk becomes a vector — a numeric fingerprint of its meaning — via the endpoint's embedding model.

4

Store

IEmbeddingStorageService

Vectors and their text are saved to a collection in the vector store of your choice.

Ingest once

Hand the hub a file and a collection name. Behind the scenes it converts, splits, embeds and stores — returning an EmbeddedDocument you can query later. Metadata and overwrite behaviour are optional arguments.

csharp
// Index a manual into the "guides" collection
EmbeddedDocument doc = await hub.IngestDocumentAsync(
    @"C:\docs\manual.pdf",
    name: "manual",
    collectionName: "guides");

Retrieve by meaning

At question time, embed the query and ask the store for its nearest passages. topN caps how many come back; minSimilarity filters out weak matches. Feed the results into a prompt and the model answers from your content, not its training data.

The SmartDocumentAdapter wraps this in a chat experience →

csharp
// Find the passages most relevant to a question
string[] hits = await hub.SimilarityQueryAsync(
    "How do I reset the device?",
    topN: 3,
    minSimilarity: 0.7f);

// Or just get a vector for any text
Embedding vector = await hub.EmbedAsync("reset procedure");

Vector stores

Keep vectors where it suits you.

The storage step is an interface — IEmbeddingStorageService — with implementations from a quick in-memory store to managed cloud databases. Start in memory during development, move to a hosted store for production, change nothing else.

In-memory

MemoryEmbeddingStorageService

Fast, ephemeral — ideal for tests and small sets.

File system

FileSystemEmbeddingStorageService

Zero-dependency persistence on local disk.

Pinecone

PineconeEmbeddingStorageService

Managed, hosted vector database.

Qdrant

QdrantEmbeddingStorageService

Open-source vector engine, self-host or cloud.

Chroma

ChromaEmbeddingStorageService

Lightweight embedding database.

Azure AI Search

AzureAISearchEmbeddingStorageService

Enterprise search with vector support.