SUPPORT.FAQ 01 / FAQ

Support

Frequently Asked Questions

Find answers to common questions about Schemantic, our features, security, and how we help you discover data relationships.

FAQ 01 What platforms does Schemantic run on?

Schemantic runs across the major enterprise warehouses — BigQuery, Databricks, Redshift, and Snowflake — and inside Google Cloud, AWS, Azure, or on-premise. The nine-element catalog is the same on every warehouse, so the catalog you install today travels if you switch stacks.

FAQ 02 What is a context layer and how does it relate to a semantic layer?

A context layer is most of, perhaps everything, an analyst or AI agent needs to understand your warehouse — not just what to query, but what the data means, how it connects, and whether it's reliable. A semantic layer is one piece of that: it defines things like business metrics, validated join paths, access controls, and terminology in a governed model. A full context layer adds data quality signals, entity resolution across source systems, lineage metadata, and patterns discovered from query history. Applications query the context layer instead of raw tables, eliminating ambiguity and improving consistency.

FAQ 03 What does "find virtually every valid join and core entity" mean?

Schemantic finds virtually every valid join between any two columns that reference the same real-world entity. We estimate we capture 99.9%+ of practical joins for most customers. Edge cases include: (1) Creatively mangled IDs with random characters inserted or encryption, (2) True decimal values used as IDs without common naming conventions, (3) Column pairs with highly unfavorable statistical and metadata properties. A "valid join" links matching records for the same entity in two different tables. A "core entity" is a dataset containing at least one ID column where each distinct value represents a distinct instance of that entity. Entity detection requires either common ID-naming conventions or more than five tables with valid joins using the ID column.

FAQ 04 How does Schemantic securely process your data?

Schemantic accesses your warehouse via read-only, per-workspace service-account credentials (OAuth 2.0 / IAM), and all calculations on raw data execute inside your cloud environment. Row-level data and table contents — including values like minimums, maximums, and modes — do not leave it; where the UI displays such a value, it is passed through at render time and never stored externally. The only information temporarily held outside your cloud is structural metadata and aggregate statistics (schemas, types, null rates, row counts, join scores) — encrypted at rest and in transit, hashed where applicable, held under strict tenant isolation with no co-mingling across customers, and deleted after subscription termination. Schemantic is SOC 2 Type II attested across all five Trust Service Criteria.

FAQ 05 Do you include every column in your join search?

No — but virtually every column that could plausibly have valid joins is checked. Excluded: (1) Non-numeric/non-string data types (dates, booleans, arrays), (2) Columns whose names indicate quantities or freeform text rather than identifiers, (3) Columns with statistical properties precluding quality joins (>99.5% null, only 1 distinct value). Exclusion reasons are visible in the Specific Join Finder tool.

FAQ 06 How does Schemantic help you untangle a fragmented data warehouse?

By ingesting data directly and generating comprehensive descriptive statistics at table and column level, Schemantic builds deep understanding of your data environment. It identifies tables containing data for each entity, provides rich per-column statistics, and offers a collaborative interface for validating which views and tables are essential, duplicative, or ready for consolidation — informing reverse-engineering, stakeholder alignment, and cleanup workflows.

FAQ 07 How do you rank joins?

User-approved joins rank highest and our algorithm completes the sort for non-user approved joins.

FAQ 08 Can Schemantic find joins in encrypted or hashed data?

Yes, if all data uses the same deterministic encryption/hashing algorithm. Schemantic matches values regardless of content — it only checks whether values match. Exceptions: (1) Mixed encrypted and unencrypted data, (2) Non-deterministic hashing where the same input produces different outputs.

FAQ 09 Do I need standard naming, formal foreign keys, or robust documentation for Schemantic to work?

No. Schemantic handles non-matching column names (customer_id vs. custNum), databases without foreign keys, and minimal documentation. Statistical analysis and pattern detection are the primary indicators, not naming conventions or constraints.

FAQ 10 Do you handle joins across many tables?

Yes. The path-finding tool constructs multi-step join paths in seconds by chaining high-quality direct joins. Each path includes a visual diagram and auto-generated SQL ready to copy.

FAQ 11 Do you include every table in your join search?

Generally yes — every table selected during onboarding. Tables with 0 or 1 rows are skipped. Tables not included in onboarding are excluded unless explicitly added.

FAQ 12 Why don't you include 1-row tables in your join search?

Single-row tables provide no meaningful joins (no matching keys to compare). Aggregated stats on 1-row tables can also leak row data, creating security concerns.

FAQ 13 What if your system misses a join or assigns a low join score?

You can declare a column as an ID column, triggering re-checking against all other ID columns. The homepage table of approved/disapproved joins lets you override low-scoring joins. Once approved, the join appears in the map, pathfinding, and auto-generated queries.

FAQ 14 How long will it take to find the joins in my data?

Most runs complete within several hours, many within an hour. The primary factor is number of tables and columns — scanning five 100GB tables is faster than 500 1GB tables because there are fewer column pairs to evaluate.

FAQ 15 Why do join lines appear and disappear when I move or zoom the map?

The map only shows joins connecting tables currently in view to avoid visual clutter.

FAQ 16 How do I know if I should approve or reject a join?

Approving or rejecting determines display in the map and use in path generation. If unsure, leave unreviewed. You can reverse decisions later, but actions are visible to other users.

FAQ 17 What is an entity and why is an entity-focused view useful?

An entity is a real-world person, place, or thing referenced in your data (customers, products, transactions). Schemantic groups columns that point to the same entity. The Entity Explorer shows joinable tables found for an entity, with pre-generated statistics. This accelerates understanding new datasets and building entity-focused analyses.

FAQ 18 How do I understand the relationship between a join and an entity?

A join indicates two columns share the same identifier — they reference the same real-world entity. Matching on that ID means both columns describe the same object.

FAQ 19 What's included in descriptive statistics and why?

Standard statistics (min, max, mean, variance, skew, kurtosis) plus functional type detection, histograms, and automatic key detection. Since Schemantic already scans data for joins, providing descriptive stats adds immediate value with negligible extra cost.

FAQ 20 Are all statistics exact?

Most are exactly calculated then rounded. Some leverage random sampling or close approximations to reduce runtime and query costs. Error margins are typically smaller than rounding effects.

FAQ 21 How long does the deployment process take?

Hours from initial authorization to viewing valid joins for many use cases. Large or complex datasets may take a few days. Extremely large environments (dozens or hundreds of terabytes) may take proportionately longer.

For petabyte-scale or further clarification, contact hello@schemantic.io.

FAQ 22 How do I refresh data to see up-to-date joins, entities, and statistics?

Edit workspace data sources from the Join Map page, select or deselect relevant data, save, and click "Join your data" to rerun inference.

FAQ 23 How many times per day can I refresh the data?

Four times per day to prevent accidental overuse. Request manual refreshes via hello@schemantic.io.

FAQ 24 Can I accelerate access verification when logging in?

For Google Cloud: (1) Request project or organization-level permissions to avoid per-table checks, (2) Request an increased IAM policy quota from Google Cloud (5,000 is usually sufficient vs. the 1,500 default).

FAQ 25 Is there a max data size?

Schemantic scales with your cloud provider's capacity. Many terabytes is routine. Contact hello@schemantic.io for extreme-scale environments.

FAQ 26 How is Schemantic different from other data tools?

Most comparable tools assume you already know how your data joins, or capture joins at the ELT/ETL level and recreate them per pipeline. Schemantic discovers joins from raw data and provides a collaborative interface for documenting and constructing SQL from those joins.

FAQ 27 Does Schemantic support real-time data processing?

No. For recently ingested data, refresh and rerun. Data is current through the time the run was initiated. Allow 15 minutes for new data to be available before initiating.

FAQ 28 On GCP, why service accounts instead of OAuth?

Service accounts provide more precise, restricted IAM controls. OAuth would allow broader user impersonation, which is less secure. The service-account approach is harder to engineer but safer for customers.

FAQ 29 What customer support is available?

Most questions are addressed in FAQ and documentation. Additional support via hello@schemantic.io.

FAQ 30 How do I share ERDs, join code, etc. with other users?

Follow your company's data security policies. The simplest option is granting Schemantic user licenses — cost-effective and the safest way to share data relationships.

FAQ 31 Why did you build this tool?

Discovering and validating table joins is tedious and error-prone across all data roles. Missing or erroneous joins hurt ML feature sets, dashboards, and downstream decisions. Schemantic automates the process.

FAQ 32 Do you offer training or onboarding assistance?

For enterprise support, contact hello@schemantic.io.

FAQ 33 How can I request new features?

Email hello@schemantic.io with ideas or requests.