Capability
Give the model your documents.
Language models do not know your contracts, manuals or policies. Vectorization is how Wisej.AI gives them that knowledge: it converts files into vectors that capture meaning, stores them, and at question time retrieves only the passages that are actually relevant. This is the retrieval-augmented generation (RAG) pipeline — built in, not bolted on.
The whole flow runs through a single IngestDocumentAsync call on the SmartHub, backed by four services you can swap independently.
Why retrieval
Grounded answers beat bigger prompts.
Accurate
The model answers from passages retrieved out of your content — and can cite them — instead of improvising from training data.
Economical
Only the relevant chunks travel with the question. You don't pay to send the whole handbook on every request.
Current
Re-ingest a document and the knowledge updates instantly. No fine-tuning cycle, no model release to wait for.
The pipeline
Four swappable stages.
Convert
IDocumentConversionService
A PDF, Word file or stream is turned into plain text the rest of the pipeline can read.
Split
ITextSplitterService
That text is broken into overlapping chunks small enough to embed and retrieve precisely.
Embed
IEmbeddingGenerationService
Each chunk becomes a vector — a numeric fingerprint of its meaning — via the endpoint's embedding model.
Store
IEmbeddingStorageService
Vectors and their text are saved to a collection in the vector store of your choice.
Ingest once
Hand the hub a file and a collection name. Behind the scenes it converts, splits, embeds and stores — returning an EmbeddedDocument you can query later. Metadata and overwrite behaviour are optional arguments.
// Index a manual into the "guides" collection
EmbeddedDocument doc = await hub.IngestDocumentAsync(
@"C:\docs\manual.pdf",
name: "manual",
collectionName: "guides");
Retrieve by meaning
At question time, embed the query and ask the store for its nearest passages. topN caps how many come back; minSimilarity filters out weak matches. Feed the results into a prompt and the model answers from your content, not its training data.
// Find the passages most relevant to a question
string[] hits = await hub.SimilarityQueryAsync(
"How do I reset the device?",
topN: 3,
minSimilarity: 0.7f);
// Or just get a vector for any text
Embedding vector = await hub.EmbedAsync("reset procedure");
Vector stores
Keep vectors where it suits you.
The storage step is an interface — IEmbeddingStorageService — with implementations from a quick in-memory store to managed cloud databases. Start in memory during development, move to a hosted store for production, change nothing else.
In-memory
MemoryEmbeddingStorageService
Fast, ephemeral — ideal for tests and small sets.
File system
FileSystemEmbeddingStorageService
Zero-dependency persistence on local disk.
Pinecone
PineconeEmbeddingStorageService
Managed, hosted vector database.
Qdrant
QdrantEmbeddingStorageService
Open-source vector engine, self-host or cloud.
Chroma
ChromaEmbeddingStorageService
Lightweight embedding database.
Azure AI Search
AzureAISearchEmbeddingStorageService
Enterprise search with vector support.