SUPPORT.FAQ / 01 / FAQ
Support

Frequently Asked Questions

Find answers to common questions about Schemantic, our features, security, and how we help you discover data relationships.

Schemantic runs across the major enterprise warehouses — BigQuery, Databricks, Redshift, and Snowflake — and inside Google Cloud, AWS, Azure, or on-premise. The nine-element catalog is the same on every warehouse, so the catalog you install today travels if you switch stacks.

A context layer is most of, perhaps everything, an analyst or AI agent needs to understand your warehouse — not just what to query, but what the data means, how it connects, and whether it's reliable. A semantic layer is one piece of that: it defines things like business metrics, validated join paths, access controls, and terminology in a governed model. A full context layer adds data quality signals, entity resolution across source systems, lineage metadata, and patterns discovered from query history. Applications query the context layer instead of raw tables, eliminating ambiguity and improving consistency.

Schemantic finds virtually every valid join between any two columns that reference the same real-world entity. We estimate we capture 99.9%+ of practical joins for most customers. Edge cases include: (1) Creatively mangled IDs with random characters inserted or encryption, (2) True decimal values used as IDs without common naming conventions, (3) Column pairs with highly unfavorable statistical and metadata properties. A "valid join" links matching records for the same entity in two different tables. A "core entity" is a dataset containing at least one ID column where each distinct value represents a distinct instance of that entity. Entity detection requires either common ID-naming conventions or more than five tables with valid joins using the ID column.

Schemantic accesses your warehouse via read-only, per-workspace service-account credentials (OAuth 2.0 / IAM), and all calculations on raw data execute inside your cloud environment. Row-level data and table contents — including values like minimums, maximums, and modes — do not leave it; where the UI displays such a value, it is passed through at render time and never stored externally. The only information temporarily held outside your cloud is structural metadata and aggregate statistics (schemas, types, null rates, row counts, join scores) — encrypted at rest and in transit, hashed where applicable, held under strict tenant isolation with no co-mingling across customers, and deleted after subscription termination. Schemantic is SOC 2 Type II attested across all five Trust Service Criteria.

No — but virtually every column that could plausibly have valid joins is checked. Excluded: (1) Non-numeric/non-string data types (dates, booleans, arrays), (2) Columns whose names indicate quantities or freeform text rather than identifiers, (3) Columns with statistical properties precluding quality joins (>99.5% null, only 1 distinct value). Exclusion reasons are visible in the Specific Join Finder tool.

By ingesting data directly and generating comprehensive descriptive statistics at table and column level, Schemantic builds deep understanding of your data environment. It identifies tables containing data for each entity, provides rich per-column statistics, and offers a collaborative interface for validating which views and tables are essential, duplicative, or ready for consolidation — informing reverse-engineering, stakeholder alignment, and cleanup workflows.

User-approved joins rank highest and our algorithm completes the sort for non-user approved joins.

Yes, if all data uses the same deterministic encryption/hashing algorithm. Schemantic matches values regardless of content — it only checks whether values match. Exceptions: (1) Mixed encrypted and unencrypted data, (2) Non-deterministic hashing where the same input produces different outputs.

No. Schemantic handles non-matching column names (customer_id vs. custNum), databases without foreign keys, and minimal documentation. Statistical analysis and pattern detection are the primary indicators, not naming conventions or constraints.

Yes. The path-finding tool constructs multi-step join paths in seconds by chaining high-quality direct joins. Each path includes a visual diagram and auto-generated SQL ready to copy.

Generally yes — every table selected during onboarding. Tables with 0 or 1 rows are skipped. Tables not included in onboarding are excluded unless explicitly added.

Single-row tables provide no meaningful joins (no matching keys to compare). Aggregated stats on 1-row tables can also leak row data, creating security concerns.

You can declare a column as an ID column, triggering re-checking against all other ID columns. The homepage table of approved/disapproved joins lets you override low-scoring joins. Once approved, the join appears in the map, pathfinding, and auto-generated queries.

Most runs complete within several hours, many within an hour. The primary factor is number of tables and columns — scanning five 100GB tables is faster than 500 1GB tables because there are fewer column pairs to evaluate.

The map only shows joins connecting tables currently in view to avoid visual clutter.

Approving or rejecting determines display in the map and use in path generation. If unsure, leave unreviewed. You can reverse decisions later, but actions are visible to other users.

An entity is a real-world person, place, or thing referenced in your data (customers, products, transactions). Schemantic groups columns that point to the same entity. The Entity Explorer shows joinable tables found for an entity, with pre-generated statistics. This accelerates understanding new datasets and building entity-focused analyses.

A join indicates two columns share the same identifier — they reference the same real-world entity. Matching on that ID means both columns describe the same object.

Standard statistics (min, max, mean, variance, skew, kurtosis) plus functional type detection, histograms, and automatic key detection. Since Schemantic already scans data for joins, providing descriptive stats adds immediate value with negligible extra cost.

Most are exactly calculated then rounded. Some leverage random sampling or close approximations to reduce runtime and query costs. Error margins are typically smaller than rounding effects.

Hours from initial authorization to viewing valid joins for many use cases. Large or complex datasets may take a few days. Extremely large environments (dozens or hundreds of terabytes) may take proportionately longer.

For petabyte-scale or further clarification, contact hello@schemantic.io.

Edit workspace data sources from the Join Map page, select or deselect relevant data, save, and click "Join your data" to rerun inference.

Four times per day to prevent accidental overuse. Request manual refreshes via hello@schemantic.io.

For Google Cloud: (1) Request project or organization-level permissions to avoid per-table checks, (2) Request an increased IAM policy quota from Google Cloud (5,000 is usually sufficient vs. the 1,500 default).

Schemantic scales with your cloud provider's capacity. Many terabytes is routine. Contact hello@schemantic.io for extreme-scale environments.

Most comparable tools assume you already know how your data joins, or capture joins at the ELT/ETL level and recreate them per pipeline. Schemantic discovers joins from raw data and provides a collaborative interface for documenting and constructing SQL from those joins.

No. For recently ingested data, refresh and rerun. Data is current through the time the run was initiated. Allow 15 minutes for new data to be available before initiating.

Service accounts provide more precise, restricted IAM controls. OAuth would allow broader user impersonation, which is less secure. The service-account approach is harder to engineer but safer for customers.

Most questions are addressed in FAQ and documentation. Additional support via hello@schemantic.io.

Follow your company's data security policies. The simplest option is granting Schemantic user licenses — cost-effective and the safest way to share data relationships.

Discovering and validating table joins is tedious and error-prone across all data roles. Missing or erroneous joins hurt ML feature sets, dashboards, and downstream decisions. Schemantic automates the process.

For enterprise support, contact hello@schemantic.io.

Email hello@schemantic.io with ideas or requests.