anthu.dev

Managing Snowflake organization listings from dbt

Anton Huck — Fri, 03 Apr 2026 01:36:36 GMT

Most teams I talk to already treat dbt as the place where tables and views get built, tested, and deployed. The awkward bit shows up right after that: Internal and External Marketplace listings — shares, grants, manifests, publish vs draft — often live in one-off SQL scripts, runbooks, or someone’s Snowsight tabs. The warehouse is versioned; the listing story is not.

I’ve been experimenting with a small dbt package that tries to close that gap: dbt-snowflake-listings (anthu/dbt_snowflake_listings on the hub of your packages.yml). The idea is simple: a listing is just another dbt model with a custom materialization. dbt run creates or alters the share, applies grants, and syncs the listing manifest — still wired into the DAG via ordinary ref() dependencies.

Fair warning up front: this is highly experimental. APIs and Snowflake listing behavior evolve; I’m dogfooding it on real projects, but I would not bet a compliance audit on “set and forget” without your own testing. Think of it as a opinionated spike that happened to grow tests and docs — not a supported product.

Why bother with listings-as-code?

Organization listings sit on top of shares and metadata. Doing that by hand works until:

You need the same objects you already model in dbt to appear in the share, in lockstep with builds.
You want idempotent updates (rerun deploy → alter listing, re-grant) instead of duplicate “create listing” scripts.
You care about reviewability: manifest YAML in Git next to the models consumers actually see.

dbt already knows your graph. The package leans on that: objects are declared with share_model / share_models and ref(), so staging runs before the listing model, and semantic views (if you use them) stay in dependency order.

What it does (in one breath)

Custom materialization organization_listing for Internal Marketplace listings.
share_models([...]) (or share_model) to register what goes into the share; object types (table, view, semantic view, Cortex search service) are auto-detected at runtime so you are not hand-picking grant verbs for every object.
Manifest as YAML under config.meta.listing_manifest in schema files — aligned with Snowflake’s organization listing manifest reference.
Lifecycle: normal runs alter in place; --full-refresh is your escape hatch when you need drop/recreate semantics.
Optional listing_ref() macro for ULL-style references on the producer side.

There is also an external_listing materialization in the repo that I treat as a blueprint for public Marketplace flows — same ideas, different privileges and constraints. I’m focusing this post on organization listings because that’s where most internal sharing pain lives.

Install

Add the package to packages.yml (pin a release tag you trust; the example below matches the sample project in the repo at time of writing):

packages:
  - git: "https://github.com/anthu/dbt-snowflake-listings.git"
    revision: v0.2.3

Then:

dbt deps

You’ll need a Snowflake role that can create shares and organization listings (often something like ACCOUNTADMIN during a spike, or a dedicated role with the right grants). The package ships a grant_listing_privileges run-operation if you want to standardize that — see the repo’s docs/macros.md.

Minimal pattern: two files

For a full example - please see the latest example in the repo itself. I will try to keep it in sync.

1. Listing model (`.sql`)

The model’s config selects the materialization and names the share. The body lists what gets granted into that share using ref() so dbt’s DAG stays honest:

{{ config(
    materialized='organization_listing',
    meta={
        'share_name': 'TPCH_SAMPLE_SHARE',
        'publish': true,
    },
) }}

{{ dbt_snowflake_listings.share_models([
    ref('stg_tpch_nation'),
    ref('stg_tpch_region'),
    ref('stg_tpch_customer'),
    ref('stg_tpch_orders'),
]) }}

That snippet is lifted from the TPC-H sample example under examples/snowflake_sample_data/ — it shares staging models built from SNOWFLAKE_SAMPLE_DATA, which is a nice zero-ingestion way to try the flow.

2. Manifest (`.yml`)

Keep prose and marketplace-facing fields in YAML next to the model. At minimum you want a clear title, description, and organization_targets; everything else maps to Snowflake’s manifest schema:

models:
  - name: tpch_sample_listing
    description: >
      Organization listing that shares TPC-H benchmark sample tables with all
      accounts in the organization via the Internal Marketplace.
    config:
      meta:
        listing_manifest:
          title: "TPC-H Sample Data (tables)"
          description: |
            Sample data from the TPC-H benchmark dataset, sourced from
            Snowflake's SNOWFLAKE_SAMPLE_DATA database.
          organization_profile: "INTERNAL"
          organization_targets:
            access:
              - all_internal_accounts: true
          locations:
            access_regions:
              - name: "ALL"
          auto_fulfillment:
            refresh_type: "SUB_DATABASE"
            refresh_schedule: "10 MINUTE"
          usage_examples:
            - title: "Top customers by order volume"
              description: "Find the most active customers by number of orders placed"
              query: >
                SELECT
                    c.CUSTOMER_NAME,
                    c.MARKET_SEGMENT,
                    COUNT(*) AS order_count,
                    SUM(o.TOTAL_PRICE) AS total_spend
                FROM STG_TPCH_CUSTOMER c
                JOIN STG_TPCH_ORDERS o ON c.CUSTOMER_KEY = o.CUSTOMER_KEY
                GROUP BY 1, 2
                ORDER BY order_count DESC
                LIMIT 20

Good usage_examples are worth the time: they show up in the listing experience and they force you to write SQL that actually matches what subscribers will query.

Run it

dbt run --select tpch_sample_listing+

Or run the whole project if your graph is small. On success you should see the share, grants, and listing aligned with what you declared — without maintaining a parallel script tree.

When you need a hard reset:

dbt run --select tpch_sample_listing --full-refresh

Producer-side querying with `listing_ref`

If you want to reference shared objects via a Uniform Listing Locator from the producer project (where the models already live), the package exposes:

SELECT *
FROM {{ dbt_snowflake_listings.listing_ref('MY_LISTING', ref('my_shared_table')) }}

Consumers in other accounts still see whatever names the listing exposes; this macro is mainly for keeping producer analytics consistent with the same DAG.

What I’d watch closely

Privileges and org settings — listing creation fails in boring ways if the role is short a grant; bake that into your platform story early.
Manifest vs reality — YAML typos or invalid combinations surface as Snowflake errors; treat manifest changes like DDL reviews.
Experimental tier — I ship semver tags, but you should still pin and read the changelog when upgrading. If something breaks, open an issue on the repo; I’m motivated by real-world friction.

From dbt Models to Snowflake Semantic Views: Best Practices for Cortex Analyst

Anton Huck — Fri, 03 Apr 2026 01:19:34 GMT

Most teams I talk to already have a decent dbt project and are now looking at Semantic Views and Cortex Analyst. The question is usually:

“How do I reuse what we have in dbt instead of building a second semantic layer from scratch?”

The good news: you don’t have to choose. dbt can stay your transformation + modeling workhorse, and Semantic Views can become the thin semantic layer that powers Cortex Analyst (and other consumers) on top.

In this post I’ll show my favorite pattern that work well in practice:

Define Semantic Views inside dbt with a dedicated package.

Along the way I’ll share a few design tips that made Cortex Analyst answers a lot more predictable.

Why add Semantic Views on top of dbt?

dbt is great at turning raw data into clean dim/fact models and shared SQL logic.

Semantic Views pick up where dbt stops:

They describe business concepts (dimensions, time dimensions, metrics, relationships).
They live as first‑class Snowflake objects (SEMANTIC VIEW) with proper governance and sharing.
They’re what Cortex Analyst uses to translate natural language into SQL.

A simple way to think about it:

dbt: “What does the warehouse look like?”
Semantic Views: “How does the business talk about this data?”

Define Semantic Views in dbt (`Snowflake-Labs/dbt_semantic_view`)

If you already treat dbt as your system of record, this is the most natural option.

Add the package:

packages:
  - package: Snowflake-Labs/dbt_semantic_view
    version: 1.0.3

Create a model that uses the semantic_view materialization and points at your existing models:

{{ config(materialized = 'semantic_view', schema = 'SEMANTICS') }}

semantic_view:
  name: orders_analytics_sv

  tables:
    - name: dim_customers
      base_table: {{ ref('dim_customers') }}
      primary_key: customer_id

    - name: fct_orders
      base_table: {{ ref('fct_orders') }}
      primary_key: order_id
      time_dimensions:
        - name: order_date
          expr: order_date

  metrics:
    - name: total_revenue
      expr: "SUM(fct_orders.order_amount)"

Run it like any other dbt model:

dbt run --select orders_analytics_sv

Behind the scenes this compiles to a CREATE SEMANTIC VIEW statement. You get:

A managed Semantic View in Snowflake.
Version control and reviews via dbt.
The same deployment story as the rest of your project.

This is ideal when you want one place (dbt) to define how tables, relationships, and metrics are wired.

Making Cortex Analyst happy

Regardless of how you create Semantic Views, a few simple rules go a long way:

Keep them focused. Think in domains like “Orders & Customers” or “Account Usage”, not “everything in the warehouse”.
Model real business language. Use dimensions and metrics that match how people actually ask questions (“revenue”, “active customers”, “region”). Add synonyms if your org loves acronyms.
Wire relationships explicitly. Many‑to‑one from facts to dimensions on clean keys avoids a lot of weird joins.
Start with a handful of verified questions. For your first Semantic View, capture 5–10 real questions and the exact SQL you expect. Use those as guardrails when you iterate. (as of writing in Private Preview)

With that in place, Cortex Analyst has a much easier time turning “show me revenue by customer segment for last quarter” into the SQL you would have written yourself.

A pragmatic migration path

If I had to start from scratch on an existing dbt project, I’d do this:

Pick one high‑value domain (e.g. product analytics, finance, account usage).
Create a single Semantic View for that domain using either option above.
Add 5–15 metrics that matter and a small set of verified questions.
Put it in front of real users, see which questions fail, and iterate.

Once that loop feels smooth, repeat for the next domain.

You end up with a thin, governed semantic layer on top of dbt that unlocks natural language and other consumers—without throwing away the modeling work you already invested in.

Bonus: experimental dbt package for converting dbt semantic models to Semantic Views

One thing I ran into while working on this: a lot of teams are already investing in dbt’s semantic layer (semantic models, measures, entities), but would still like to end up with Snowflake-native Semantic Views at the end of the day.

Instead of rewriting everything by hand, I started an experimental dbt package that tries to bridge exactly that gap:

anthu/dbt_semantic_view_converter – very early, APIs may change, feedback highly welcome.

The idea is simple:

You define semantic models in dbt the way you normally would (in schema.yml).
You add a small semantic view model with a special materialization.
When you run dbt run, the package generates the corresponding CREATE SEMANTIC VIEW DDL for you and creates a Snowflake Semantic View based on that config.

How it works (high level)

Install the package in packages.yml:

packages:
  - git: "https://github.com/sfc-gh-ahuck/dbt_semantic_view_converter.git"
    revision: main

Then:

dbt deps
dbt parse

Define your semantic model (dbt semantic layer) in schema.yml:

semantic_models:
  - name: orders
    description: "Order fact table"
    model: ref('dim_orders')

    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign

    dimensions:
      - name: order_date
        type: time
        type_params:
          time_granularity: day
      - name: order_status
        type: categorical

    measures:
      - name: order_total
        agg: sum
      - name: order_count
        expr: 1
        agg: sum

Create a corresponding “semantic view” model in dbt:

-- models/semantic_views/orders_semantic_view.sql

{{ config(
    materialized = 'semantic_view',
    schema = 'semantic_layer'
) }}

-- The SELECT itself is just a placeholder.
-- The package reads the semantic model config and generates the DDL.
SELECT 1 AS placeholder;

Run dbt:

dbt run --models orders_semantic_view

Behind the scenes, the package inspects your semantic model config and emits a CREATE OR REPLACE SEMANTIC VIEW statement with tables, relationships, dimensions, facts, and metrics wired up according to that definition.

The end result looks roughly like:

CREATE OR REPLACE SEMANTIC VIEW analytics.semantic_layer.orders
  COMMENT = 'Order fact table'
  /* TABLES, RELATIONSHIPS, DIMENSIONS, METRICS ... */
  COPY GRANTS;

Why this might be interesting

If you’re already invested in dbt’s semantic layer, this gives you a path to:

Keep one definition of business logic (semantic models in dbt),
But still end up with Snowflake-native Semantic Views that Cortex Analyst and other tools can consume directly,
While reusing dbt’s existing workflows for dependencies, tests, docs, and CI/CD.

It’s especially handy for teams who:

Don’t want to maintain two separate semantic definitions,
Prefer reviewing semantic changes via PRs in the dbt repo,
And like the idea of “dbt is the source of truth, Snowflake Semantic Views are the runtime interface”.

Please treat it as experimental

This is very much a work-in-progress:

The materialization name, config options, and generated SQL shape may still change.
Error messages and guardrails are basic.
I’m still figuring out what the “right” abstraction level is (how much of the Snowflake DDL to expose vs hide).

So: don’t drop it straight into mission‑critical projects yet.

If you do try it out in a sandbox or side‑project, I’d really love feedback:

Does the workflow fit how you use dbt today?
Is the mapping from semantic models → Semantic View what you expect?
What would you need before trusting this in a real environment?

Issues, discussions, and PRs are all welcome in the repo:

👉 anthu/dbt_semantic_view_converter on GitHub

Multi-Model Cost Optimization with Snowflake Cortex and OpenClaw

Anton Huck — Tue, 17 Feb 2026 18:36:27 GMT

In Part 1 I connected OpenClaw to Snowflake Cortex as the LLM backend. Enterprise-grade security, unified billing, data stays in Snowflake. So far so good.

But after running it for a while I noticed something: most of the tokens OpenClaw burns through aren't from the main agent doing complex reasoning. They're from subagents doing file searches, reading docs, and grepping through code. Simple stuff. And all of that was running on the same expensive model as the main agent.

My naive approach was using claude-sonnet-4-5 for everything. And you guessed it - the bill reflected that.

The Key Insight: Not Every Task Needs a $3 Model

OpenClaw uses a hierarchical architecture. The main agent does the thinking - planning, reasoning, decision-making. But it delegates a lot of the grunt work to subagents: searching files, reading documentation, generating boilerplate code. These tasks are straightforward enough that a $0.03 model handles them just fine.

The main agent stays on the best model. The subagents run on whatever is cheapest for their specific job.

What Cheap Models Can Handle

I spent some time testing which models are "good enough" for different subagent tasks. Here's what I found:

Ultra-cheap ($0.03-$0.06) works perfectly for file search, grep, listing directories, and reading/summarizing docs. These are essentially pattern matching tasks. llama3.1-8b at $0.03/$0.03 per 1M tokens is my go-to here. For doc summarization, openai-gpt-5-nano at $0.06/$0.44 does a solid job.

Budget models ($0.12-$0.25) are good for boilerplate code generation, test scaffolding, simple refactoring, and config file generation. snowflake-llama-3.3-70b at $0.12/$0.12 is particularly interesting here because Snowflake tuned it specifically for their workloads. llama3.1-70b at $0.25/$0.25 handles general code generation well.

Mid-tier ($1.00-$1.25) is where you go when you actually need reasoning: code reviews, bug analysis, API integrations. claude-haiku-4-5 at $1.00/$5.00 or openai-gpt-5 at $1.25/$10.00.

The Numbers Don't Lie

Let's do the math on a typical exploration-heavy session. Say 100K input tokens and 50K output tokens total, with about 80% of that going to subagents (which is realistic for codebase exploration).

Configuration	Main Agent Cost	Subagent Cost	Total
All Sonnet	$0.30	$0.75	$1.05
Sonnet + Haiku	$0.06	$0.28	$0.34
Sonnet + llama3.1-8b	$0.06	$0.004	$0.064
Sonnet + llama3.1-70b	$0.06	$0.03	$0.09

That's a 94% cost reduction going from all-Sonnet to Sonnet + llama3.1-8b subagents. Even compared to Haiku subagents, llama3.1-8b is 97% cheaper on input and 99% cheaper on output.

What a difference.

Configuration

The setup lives in two places: ~/.openclaw/openclaw.json for the provider config and ~/.openclaw/agents/main/agent/models.json for model definitions. Here's my exploration-heavy config that I use most of the time:

{
  "providers": {
    "cortex": {
      "baseUrl": "https://-.snowflakecomputing.com/api/v2/cortex/v1",
      "apiKey": "",
      "api": "openai-completions",
      "models": [
        {
          "id": "claude-sonnet-4-5",
          "name": "Cortex Claude Sonnet 4.5",
          "reasoning": true,
          "input": ["text", "image"],
          "contextWindow": 200000,
          "maxTokens": 16384,
          "cost": {"input": 3.00, "output": 15.00, "cacheRead": 0.30, "cacheWrite": 3.75},
          "compat": {"supportsDeveloperRole": false, "maxTokensField": "max_completion_tokens"}
        },
        {
          "id": "llama3.1-8b",
          "name": "Cortex Llama 3.1 8B",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 32000,
          "maxTokens": 8192,
          "cost": {"input": 0.03, "output": 0.03, "cacheRead": 0, "cacheWrite": 0},
          "compat": {"supportsDeveloperRole": false, "maxTokensField": "max_completion_tokens"}
        }
      ]
    }
  },
  "agents": {
    "defaults": {
      "model": {"primary": "cortex/claude-sonnet-4-5"},
      "subagents": {
        "maxConcurrent": 8,
        "model": "cortex/llama3.1-8b"
      }
    }
  }
}

The important bit is the cost field on each model. With those configured, the Openclaw dashboard actually tracks your spend per model - so you can see exactly where the money goes.

Workflow-Specific Setups

I switch between a few configurations depending on what I'm doing:

Exploration & file search: Sonnet main + llama3.1-8b subagents ($0.03/$0.03). This is my default. 97% cheaper than Haiku subagents and perfectly fine for grepping, finding files, and navigating codebases.

Documentation & research: Sonnet main + openai-gpt-5-nano subagents ($0.06/$0.44). Slightly more capable for summarization tasks but still 94% cheaper on input than Haiku.

Code generation: Sonnet main + llama3.1-70b subagents ($0.25/$0.25). When the subagents need to write actual code rather than just find files. Balanced quality/cost.

Snowflake-specific work: Sonnet main + snowflake-llama-3.3-70b subagents ($0.12/$0.12). Snowflake's own tuned model. 88% cheaper than Haiku and optimized for SQL generation and data tasks. I use this when working on Snowflake projects specifically.

Maximum quality: When I'm working on critical production code or complex architecture and cost doesn't matter - claude-opus-4-6 main + claude-sonnet-4-5 subagents. Premium everything. But honestly I rarely need this.

Monitoring What You Spend

Beyond the Openclaw dashboard, you can query actual consumption directly from Snowflake:

SELECT 
    MODEL_NAME,
    SUM(INPUT_TOKENS) as total_input_tokens,
    SUM(OUTPUT_TOKENS) as total_output_tokens,
    SUM(CREDITS_USED) as total_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.CORTEX_REST_API_USAGE_HISTORY
WHERE START_TIME >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY MODEL_NAME
ORDER BY total_credits DESC;

This gives you the ground truth on what's actually being consumed. I run this weekly to make sure my assumptions about subagent token distribution still hold.

But you don't always want to write SQL just to check how things are going. Openclaw itself ships with a usage view that breaks down token consumption and cost per model, per session. Once you have the cost fields configured in your model definitions (as shown above), the dashboard picks them up automatically and gives you a nice overview of where your tokens are going. In my case the split is pretty obvious - the main agent shows up as one big chunk and the subagent calls are spread across dozens of small, cheap requests.

What I like about this view is that you can immediately see if a subagent task is burning more tokens than expected. If one of the llama3.1-8b calls suddenly shows high token counts, that's usually a sign that the task is too complex for the cheap model and should be bumped up a tier. Most of the time though, the numbers confirm what you'd expect: the majority of subagent calls are tiny and cheap.

Available Models at a Glance

Cortex currently offers 22 models across the REST API. Here are the ones I find most relevant for Openclaw setups, grouped by what I'd actually use them for:

Tier	Model	Input/Output ($/1M tokens)	Use Case
Premium	claude-opus-4-6	$5.00/$25.00	Main agent (max quality)
Standard	claude-sonnet-4-5	$3.00/$15.00	Main agent (recommended)
Budget	claude-haiku-4-5	$1.00/$5.00	Quality subagents
Budget	llama3.1-70b	$0.25/$0.25	Code generation subagents
Ultra-Budget	snowflake-llama-3.3-70b	$0.12/$0.12	Snowflake-specific subagents
Ultra-Budget	openai-gpt-5-nano	$0.06/$0.44	Doc summarization subagents
Ultra-Budget	llama3.1-8b	$0.03/$0.03	File search subagents
Ultra-Budget	mistral-7b	$0.03/$0.03	Pattern matching subagents

There are more models available (deepseek-r1, llama4-maverick, mistral-large2, openai-o4-mini, etc.) but these are the ones I actually use regularly.

Practical Tips

Start ultra-cheap. Use llama3.1-8b for all subagents first. Upgrade individual task types only when you notice quality issues. You'd be surprised how rarely that happens for file search and navigation tasks.

Match model to task, not to habit. It's tempting to use Haiku everywhere because it's "the cheap Claude model." But for most subagent tasks, it's leaving money on the table. A $0.03 model that searches files is just as good as a $1.00 model for that specific job.

Use prompt caching. Cortex supports prompt caching for OpenAI and Anthropic models. For OpenAI models it's implicit (kicks in at 1024+ tokens). For Anthropic models you need to add cache points in the request. Either way, it cuts repeated context costs dramatically.

Run subagents in parallel. With maxConcurrent: 8 and $0.03 subagents, you can do a lot of exploration in parallel for almost nothing. Much better than sequentially running one expensive agent.

Wrapping Up

The main takeaway: most of the tokens your AI coding agent burns through are on simple tasks. File search, grep, doc reading - these don't need expensive models. By routing them to $0.03 models via Snowflake Cortex, you keep the quality where it matters (the main agent) while cutting overall costs by 90%+.

And because everything runs through Cortex, you get unified billing, enterprise security, and the ability to monitor actual usage through Snowflake's account usage views. No separate API keys to manage, no surprise bills from different providers.

Connecting OpenClaw to Snowflake Cortex

Anton Huck — Tue, 17 Feb 2026 07:05:18 GMT

I've been playing around with OpenClaw - an open-source AI assistant framework - and wanted to hook it up to Snowflake's Cortex LLM API. The idea: use enterprise-grade models like Claude Sonnet 4.5 through Snowflake's infrastructure while keeping my existing config as a fallback. Sounds straightforward, right?

Well, it kind of is. And kind of isn't. The integration itself is surprisingly clean, but getting there involved a few detours I didn't expect.

Why Cortex?

Here's the thing most people don't realize about Snowflake Cortex: it exposes a Chat Completions API that's a superset of the OpenAI API. That means any tool that supports OpenAI can - in theory - connect to Snowflake with minimal changes. You get Claude, GPT, Llama, Mistral and others through a single endpoint, all billed through Snowflake credits. Plus the usual enterprise goodies: network policies, PAT tokens with role restrictions, audit logging. All built-in.

So far so good.

Setting Up a Least-Privilege Service Account

Rather than reusing an admin account (please don't), I created a dedicated service user. The SNOWFLAKE.CORTEX_USERdatabase role grants access to Cortex LLM functions - nothing more. No data access, no warehouse modification rights.

-- Create a role with only Cortex access
CREATE ROLE JEEVES_ROLE;

-- Grant the Cortex User database role (required for REST API)
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE JEEVES_ROLE;

-- Grant warehouse usage (no OPERATE/MODIFY - just usage)
GRANT USAGE ON WAREHOUSE TASK_WH TO ROLE JEEVES_ROLE;

Since the service runs from a known IP range, I also locked it down with a network rule and policy:

-- Create a network rule for the service's IP range
CREATE OR REPLACE NETWORK RULE ADMIN_DB.NETWORK_POLICY_MGMT.JEEVES_SERVICE
  MODE = INGRESS
  TYPE = IPV4
  VALUE_LIST = ('10.0.1.0/24');  -- Could be a VPC CIDR, a static IP, whatever fits your setup

-- Create a network policy referencing the rule
CREATE NETWORK POLICY JEEVES_POLICY
  ALLOWED_NETWORK_RULE_LIST = (ADMIN_DB.NETWORK_POLICY_MGMT.JEEVES_SERVICE);

Even if the PAT token gets compromised, it can only be used from the allowed IP range. Belt and suspenders.

For the user itself, Snowflake's TYPE = SERVICE is the right choice for programmatic access:

CREATE USER JEEVES
  TYPE = SERVICE
  DEFAULT_ROLE = JEEVES_ROLE
  DEFAULT_WAREHOUSE = TASK_WH
  NETWORK_POLICY = JEEVES_POLICY
  COMMENT = 'AI assistant service account';

GRANT ROLE JEEVES_ROLE TO USER JEEVES;

And for auth, a PAT token with role restriction:

ALTER USER JEEVES ADD PROGRAMMATIC ACCESS TOKEN 
  JEEVES_PAT
  ROLE_RESTRICTION = 'JEEVES_ROLE'
  DAYS_TO_EXPIRY = 365;

The ROLE_RESTRICTION bit is important - it prevents the token from being used with elevated privileges even if someone grants additional roles to the user later. And heads up: the token secret is only shown once at creation time. Save it immediately.

Configuring the Application

The Cortex Chat Completions API endpoint follows this pattern:

https://-.snowflakecomputing.com/api/v2/cortex/v1

One thing that tripped me up right away: you need to use your account name, not the account locator. You can check yours with:

SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME();
-- e.g. returns: myorganization-myaccount

So the URL becomes: https://myorganization-myaccount.snowflakecomputing.com/api/v2/cortex/v1

For OpenClaw, I added a new provider using the OpenAI-compatible API:

{
  "env": {
    "CORTEX_API_KEY": ""
  },
  "models": {
    "mode": "merge",
    "providers": {
      "cortex": {
        "baseUrl": "https://-.snowflakecomputing.com/api/v2/cortex/v1",
        "apiKey": "${CORTEX_API_KEY}",
        "api": "openai-completions",
        "models": [
          {
            "id": "claude-sonnet-4-5",
            "name": "Cortex Sonnet 4.5",
            "reasoning": true,
            "input": ["text", "image"],
            "contextWindow": 200000,
            "maxTokens": 16384,
            "compat": {
              "maxTokensField": "max_completion_tokens",
              "supportsDeveloperRole": false
            }
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "cortex/claude-sonnet-4-5",
        "fallbacks": ["snowflake/claude-sonnet-4-5"]
      }
    }
  }
}

See that compat object? Those two settings are doing the heavy lifting. More on that in a second.

The Troubleshooting Journey

Getting this working wasn't exactly a smooth ride. Here's what actually happened.

404 Not Found

My first attempt used the account locator in the URL. 404. The fix was trivially simple once I figured it out - use the account name, not the locator:

SELECT CURRENT_ACCOUNT_NAME();  -- Returns your account name (use this)
SELECT CURRENT_ACCOUNT();       -- Returns the account locator (don't use this!)

"developer messages are not supported"

The proxy revealed it. OpenAI introduced a developer message role for reasoning models like o1. When Openclaw detects a reasoning model ("reasoning": true), it sends system prompts with role: "developer" instead of role: "system". Cortex doesn't support this.

Fix: "supportsDeveloperRole": false in the model's compat settings.

"max_tokens is deprecated"

Earlier testing with curl already revealed this one. OpenAI's newer API uses max_completion_tokens instead of the legacy max_tokens. Cortex follows this convention strictly and will reject requests using the old parameter.

Fix: "maxTokensField": "max_completion_tokens" in the model's compat settings.

And It Works

After applying both fixes:

$ openclaw agent --local -m "Say hello in one word" --session-id test
Hello!

What a relief.

What I Learned

There are a few debugging lessons worth highlighting. First, use a proxy when debugging API integrations - it reveals the actual request and response bodies, especially when responses are compressed. Second, "OpenAI-compatible" doesn't mean identical. Providers implement different subsets of the API, and the edge cases will get you. And third, 400 errors often have descriptive JSON bodies if you can get past the gzip encoding.

Security Layers

For those keeping score, here's what the setup looks like from a security perspective. Authentication goes through a PAT token (not a password). Authorization is role-restricted to CORTEX_USER only. Network access is limited to an IP allowlist via network policy. The service account has zero data access - Cortex API only. The PAT is role-restricted so it can't be used to escalate privileges. The warehouse grants USAGE only with no modify rights. And the token expires after 365 days with rotation capability.

The Result

OpenClaw now uses Snowflake Cortex as its primary LLM provider, with an existing Snowflake Anthropic endpoint as a fallback. All AI inference routes through my Snowflake account, which gives me centralized billing, enterprise audit logging, network-layer security, consistent access to the latest models, and automatic failover if the primary is unavailable.

The key takeaways: use TYPE = SERVICE for programmatic users. Always use ROLE_RESTRICTION on PAT tokens. Network policies add a real extra layer for service accounts. The OpenAI-compatible endpoint at /api/v2/cortex/v1 makes integration with existing tools surprisingly straightforward - once you work through the quirks.

If you're running any tool that speaks OpenAI, connecting it to Cortex is worth the effort.

What's Next

Now that the plumbing is in place, the interesting part begins. Cortex gives you access to a whole range of models through a single endpoint - Claude, GPT, Llama, Mistral, and more. That means you can start matching the right model to the right task instead of throwing your most expensive model at everything.

My plan is to set up specialized agents: a lightweight model like Haiku for quick summarization and triage, something like Llama for code generation where you need fast iteration, and Sonnet for the heavy lifting - complex reasoning, architecture decisions, that kind of thing. Same endpoint, same auth, same billing. You just swap the model ID in the config and you're done.

The cost savings add up quickly. Not every task needs the biggest model, and with Cortex you don't need separate API keys, billing accounts, or provider integrations to find that out. One service account, one network policy, multiple models - each doing what it's best at.

If you're running any tool that speaks OpenAI, connecting it to Cortex is worth the effort.

The write-up on this follows in the next days! - Subscribe today

Resources:

The Hidden Trap in Snowflake's INFER_SCHEMA

Anton Huck — Tue, 13 Jan 2026 20:51:40 GMT

I've always been a fan of Snowflake's INFER_SCHEMA() function. Not only because it saves you from manually typing out column definitions but the whole idea of letting Snowflake figure out your schema from actual data feels like the right level of automation. But recently I had to learn its limitations the hard way.

The Problem That Bit Me

Here's what happened: I loaded a sample file with IDs like 1, 2, 3 — single-digit integers. INFER_SCHEMA() happily inferred NUMBER(1,0) for the column. Makes sense, right? Minimal precision for single digits.

Then production data arrived with IDs in the millions.

Numeric value '1234567' is out of range

What a bummer.

The issue is that INFER_SCHEMA() acts too precise. It looks at your sample data and picks the tightest type that fits. A file with values like 1.5 and 2.3 gets inferred as NUMBER(2,1) — which explodes when 123.456 shows up later.

The Template Pattern Nobody Talks About

Here's the thing: when you create a table using USING TEMPLATE, you're not locked into the raw output of INFER_SCHEMA(). The template is just SQL — and you can transform it however you want.

Most tutorials show you this basic pattern:

CREATE TABLE my_table
USING TEMPLATE (
    SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
    FROM TABLE(INFER_SCHEMA(...))
);

So far so good. But that OBJECT_CONSTRUCT(*) is where the magic happens — or doesn't happen if you're just passing everything through unchanged.

SQL Functions to the Rescue

You can use any SQL function inside the template to transform column names, types, or other properties. Let me show you what I mean.

Uppercasing column names:

CREATE TABLE my_table
USING TEMPLATE (
    SELECT ARRAY_AGG(
        OBJECT_CONSTRUCT(
            'COLUMN_NAME', UPPER(COLUMN_NAME),  -- Force uppercase
            'TYPE', TYPE,
            'NULLABLE', NULLABLE,
            'ORDER_ID', ORDER_ID
        )
    )
    WITHIN GROUP (ORDER BY ORDER_ID)
    FROM TABLE(INFER_SCHEMA(...))
);

Why would you want this? Schema evolution creates columns in UPPERCASE. If your initial columns are lowercase from the CSV header but new columns come in as uppercase, you end up with case mismatches. Normalizing upfront avoids the headache.

Broadening numeric types:

Here's the pattern I use now for every INFER_SCHEMA workflow:

CREATE TABLE my_table
USING TEMPLATE (
    SELECT ARRAY_AGG(
        OBJECT_CONSTRUCT(
            'COLUMN_NAME', UPPER(COLUMN_NAME),
            'TYPE', CASE 
                WHEN REGEXP_LIKE(TYPE, 'NUMBER\\([0-9]+,\\s?0\\)') THEN 'NUMBER(38, 0)'
                WHEN STARTSWITH(TYPE, 'NUMBER(') THEN 'DOUBLE'
                ELSE TYPE
            END,
            'NULLABLE', NULLABLE,
            'ORDER_ID', ORDER_ID
        )
    )
    WITHIN GROUP (ORDER BY ORDER_ID)
    FROM TABLE(INFER_SCHEMA(...))
);

Let me break down that CASE statement:

Pattern	Transformation	Why
`NUMBER(X, 0)`	`NUMBER(38, 0)`	Integers get minimal precision — `NUMBER(1,0)` for single digits. Broadening to `NUMBER(38,0)` handles any integer.
`NUMBER(X, Y)`	`DOUBLE`	Decimals like `NUMBER(3,2)` overflow with larger values. `DOUBLE` provides flexibility for real-world data.
Everything else	Keep as-is	Strings, timestamps, booleans don't have this problem.

The regex NUMBER\$[0-9]+,\\s?0\$ matches integer types (scale of 0), while STARTSWITH(TYPE, 'NUMBER(') catches any remaining numeric types that have decimal places.

The Full Recipe

Here's the complete pattern I use for CSV files:

-- Create file format that reads headers
CREATE OR REPLACE FILE FORMAT my_csv_format
    TYPE = 'CSV'
    PARSE_HEADER = TRUE
    ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE;

-- Create table with broadened types
CREATE OR REPLACE TABLE my_table
USING TEMPLATE (
    SELECT ARRAY_AGG(
        OBJECT_CONSTRUCT(
            'COLUMN_NAME', UPPER(COLUMN_NAME),
            'TYPE', CASE 
                WHEN REGEXP_LIKE(TYPE, 'NUMBER\\([0-9]+,\\s?0\\)') THEN 'NUMBER(38, 0)'
                WHEN STARTSWITH(TYPE, 'NUMBER(') THEN 'DOUBLE'
                ELSE TYPE
            END,
            'NULLABLE', NULLABLE,
            'ORDER_ID', ORDER_ID
        )
    )
    WITHIN GROUP (ORDER BY ORDER_ID)
    FROM TABLE(
        INFER_SCHEMA(
            LOCATION => '@my_stage/data/',
            FILE_FORMAT => 'my_csv_format'
        )
    )
)
ENABLE_SCHEMA_EVOLUTION = TRUE;

The ENABLE_SCHEMA_EVOLUTION = TRUE is critical if you want new columns to be added automatically — but that's a story for the next post.

What About Parquet?

Good news: Parquet files have type information embedded in their metadata, so the inferred types are much more accurate. You typically don't need the numeric broadening trick.

But you should still uppercase the column names for consistency with schema evolution:

CREATE TABLE my_table
USING TEMPLATE (
    SELECT ARRAY_AGG(
        OBJECT_CONSTRUCT(
            'COLUMN_NAME', UPPER(COLUMN_NAME),  -- Still important!
            'TYPE', TYPE,
            'NULLABLE', NULLABLE,
            'ORDER_ID', ORDER_ID
        )
    )
    WITHIN GROUP (ORDER BY ORDER_ID)
    FROM TABLE(INFER_SCHEMA(...))
);

The Takeaway

INFER_SCHEMA() is fantastic for prototyping and development. But for production, don't trust it blindly — post-process the template using SQL functions to broaden numeric types and normalize column names. Your future self (and your pipelines) will thank you.

Next up: How to combine this pattern with Snowpipe for automatic schema evolution. Stay tuned.

Inferring Schema from VARIANT Fields in Snowflake

Anton Huck — Tue, 13 Jan 2026 16:01:15 GMT

I get asked about this a lot. Someone lands JSON from a REST API into a VARIANT column, and now they want proper columns without manually writing json_data:field1::VARCHAR, json_data:field2::NUMBER for every single path. And you guessed it - there's no built-in INFER_SCHEMA for VARIANT columns like there is for staged files.

The reason? It's genuinely hard. Unlike staged files where Snowflake can sample a few files upfront, VARIANT columns can contain wildly different structures across rows, nested objects go arbitrarily deep, and arrays make things exponentially messier. So far so good - but that doesn't mean we can't build something ourselves.

The Core Trick: Recursive FLATTEN + TYPEOF

Here's the discovery query that makes everything possible:

SELECT DISTINCT
    f.path AS original_path,
    UPPER(REPLACE(f.path, '.', '_')) AS column_name,
    TYPEOF(f.value) AS data_type
FROM my_table,
LATERAL FLATTEN(json_data, RECURSIVE => TRUE) f
WHERE TYPEOF(f.value) NOT IN ('OBJECT')
  AND f.path NOT LIKE '%[%'
ORDER BY column_name;

FLATTEN with RECURSIVE => TRUE walks the entire JSON tree and returns every path. TYPEOF() tells us what's at each path. The NOT LIKE '%[%' filter excludes array contents - I'll explain why in a moment.

Run this against a typical API response and you'll see something like:

original_path	column_name	data_type
address.city	ADDRESS_CITY	VARCHAR
address.geo.lat	ADDRESS_GEO_LAT	DOUBLE
balance	BALANCE	DECIMAL
is_active	IS_ACTIVE	BOOLEAN
orders	ORDERS	ARRAY
tags	TAGS	ARRAY

Why Arrays Stay as VARIANT

My first instinct was to recursively explode everything. But that creates a mess:

Arrays can have different lengths per row - do you create ITEM_0, ITEM_1, ITEM_2... how many?
Nested arrays explode your row count exponentially
The resulting schema becomes unpredictable

Instead, the approach I landed on keeps top-level arrays as VARIANT columns. You can still query them with LATERAL FLATTEN when you need to, but your base schema stays stable. The people who need to dig into arrays can handle that downstream.

Wrapping It in a Procedure

Once you've got the discovery query working, wrapping it in a stored procedure makes it reusable:

CREATE OR REPLACE PROCEDURE discover_json_schema(
    source_table VARCHAR,
    variant_column VARCHAR
)
RETURNS ARRAY
LANGUAGE SQL
AS
DECLARE
    schema_array ARRAY;
BEGIN
    SELECT ARRAY_AGG(OBJECT_CONSTRUCT(
        'original_path', original_path,
        'column_name', column_name,
        'sql_type', sql_type
    ))
    INTO schema_array
    FROM (
        SELECT DISTINCT
            f.path AS original_path,
            UPPER(REPLACE(f.path, '.', '_')) AS column_name,
            CASE
                WHEN TYPEOF(f.value) = 'INTEGER' THEN 'NUMBER'
                WHEN TYPEOF(f.value) IN ('DOUBLE', 'DECIMAL') THEN 'FLOAT'
                WHEN TYPEOF(f.value) = 'BOOLEAN' THEN 'BOOLEAN'
                WHEN TYPEOF(f.value) = 'ARRAY' THEN 'VARIANT'
                ELSE 'VARCHAR'
            END AS sql_type
        FROM IDENTIFIER(:source_table),
        LATERAL FLATTEN(IDENTIFIER(:variant_column), RECURSIVE => TRUE) f
        WHERE TYPEOF(f.value) NOT IN ('OBJECT')
          AND f.path NOT LIKE '%[%'
        ORDER BY column_name
    );

    RETURN schema_array;
END;

Call it like this:

CALL discover_json_schema('my_db.my_schema.raw_api_data', 'json_data');

Returns an array of objects you can loop through to generate DDL.

[
  {
    "column_name": "EMAIL",
    "original_path": "email",
    "sql_type": "VARCHAR"
  },
  {
    "column_name": "IS_ACTIVE",
    "original_path": "is_active",
    "sql_type": "BOOLEAN"
  },
-- [... and so on ]
]

Generating Views Automatically

The natural next step - a procedure that creates a view with all discovered columns:

CREATE OR REPLACE PROCEDURE generate_flattened_view(
    source_table VARCHAR,
    variant_column VARCHAR,
    target_view VARCHAR
)
RETURNS VARCHAR
LANGUAGE SQL
AS
DECLARE
    ddl_statement VARCHAR;
    select_cols VARCHAR;
    schema_array ARRAY;
BEGIN
    CALL discover_json_schema(:source_table, :variant_column) INTO schema_array;

    SELECT LISTAGG(
        'GET_PATH(' || :variant_column || ', ''' || s.value:original_path::VARCHAR || ''')::' ||
        s.value:sql_type::VARCHAR || ' AS "' || s.value:column_name::VARCHAR || '"',
        ', '
    ) WITHIN GROUP (ORDER BY s.value:column_name::VARCHAR)
    INTO select_cols
    FROM TABLE(FLATTEN(:schema_array)) s;

    ddl_statement := 'CREATE OR REPLACE VIEW ' || :target_view ||
                     ' AS SELECT ' || select_cols ||
                     ' FROM ' || :source_table;
    EXECUTE IMMEDIATE ddl_statement;

    RETURN 'Created view: ' || :target_view;
END;

Now one call flattens your entire JSON structure:

CALL generate_flattened_view(
    'raw_api_data',
    'json_data',
    'api_data_flat'
);

Dynamic Tables for Auto-Refresh

Same idea, but with Dynamic Tables for continuously refreshing data:

CREATE OR REPLACE PROCEDURE generate_flattened_dynamic_table(
    source_table VARCHAR,
    variant_column VARCHAR,
    target_dt VARCHAR,
    warehouse VARCHAR DEFAULT 'COMPUTE_WH',
    target_lag VARCHAR DEFAULT '1 hour'
)
RETURNS VARCHAR
LANGUAGE SQL
AS
DECLARE
    ddl_statement VARCHAR;
    select_cols VARCHAR;
    schema_array ARRAY;
BEGIN
    CALL discover_json_schema(:source_table, :variant_column) INTO schema_array;

    SELECT LISTAGG(
        'GET_PATH(' || :variant_column || ', ''' || s.value:original_path::VARCHAR || ''')::' ||
        s.value:sql_type::VARCHAR || ' AS "' || s.value:column_name::VARCHAR || '"',
        ', '
    ) WITHIN GROUP (ORDER BY s.value:column_name::VARCHAR)
    INTO select_cols
    FROM TABLE(FLATTEN(:schema_array)) s;

    ddl_statement := 'CREATE OR REPLACE DYNAMIC TABLE ' || :target_dt ||
                     ' TARGET_LAG = ''' || :target_lag || '''' ||
                     ' WAREHOUSE = ' || :warehouse ||
                     ' AS SELECT ' || select_cols ||
                     ' FROM ' || :source_table;
    EXECUTE IMMEDIATE ddl_statement;

    RETURN 'Created dynamic table: ' || :target_dt;
END;

Caveats

This is a starting point, not a production-ready solution. Things you'll likely need to adjust:

Type conflicts: If the same path has different types across rows (eg. APIs returning null vs 0), the discovery picks one. You might want majority-wins logic or explicit overrides.
Column name collisions: user.id and user_id both become USER_ID. Add disambiguation if your data has this. For example you can use a different separator.
Schema evolution: New fields in source JSON won't automatically appear. Re-run the procedure or build a scheduled task to detect drift.
Performance: Scanning the entire table for schema discovery is expensive. Consider sampling with TABLESAMPLE or LIMIT for large tables.

When to Use What

One-off exploration: Run the discovery query directly, eyeball the results
Stable schema, needs to stay current: Generate a View
Performance-critical queries on semi-structured data: Generate a Dynamic Table
Complex transformation logic: Use the Python variant of the procedure and add your business rules

The full notebook with all procedures and sample data is available - drop me a line if you want it.

Random Numbers in Tableau

Anton Huck — Mon, 10 Jan 2022 11:00:00 GMT

Having a lot of uniform data you might want to introduce some jitter to the data visualization.

Most of the tutorials would suggest to use a random number and apply it to an axis to offset the visualistion point a bit. So far so good.

Tableau for special had a hidden function called RANDOM() for quite some while which got removed recently. Their forums are full of requests to bring it back without much response from what I see.

On my search for a workaround I found following solution for Snowflake (should work for every DB, though):

// The easiest way
RAWSQL_INT("random(42)")

// Uniform the random number to be between 0 and 1
RAWSQL_REAL("uniform(0::float, 1::float, random(42)")

The idea is quite straight forward: I'm simply using the random() function of my datasource. In case of Snowflake I suggest to use a seeded random function what will prevent "jumping" datapoints after each reload (In my case it's seed "42").

In case you need a constrained random number, simply use the Snowflake uniform() function to map the random numbers to your needs.

This works for non-extracted Data-Sources. For everything else, I'll post a Pseudo-Random-Number-Generator code shortly. Interested? Subscribe to my Blog or drop me a line on Twitter.

Finding a lost Apple Pencil using sysdiagnose

Anton Huck — Sun, 26 Dec 2021 11:00:00 GMT

I've always been a fan of Apple's iCloud Find My. Not only because I tend to misplace my keys or my Phone but the overall experience of the ecosystem. But lastly I had to learn the limitations the hard way. You can't use Find My for all Apple devices and they excluded one of the probably most lost devices: The Apple Pencil. I had to be creative to come up with an approximation, where to find a lost one. Read about it in this post.

The Mission

A few days back my sister told me that she lost her Apple Pencil at the University. My naive answer was (being the Find My guy): check the Find My App and you should see the last location. Unfortunately she couldn't find the Pencil in her app. Double-checking the Find My landing page she seems to be right - Apple Pencil is not supported by the app.

My first thoughts were: well, it's almost Christmas, but my second thought was about helping to find her belonging.

There are at least the following approaches to find a lost Bluetooth device:1. Searching where it got lost2. (paid) Apps3. Try Bluetooth pairing (maybe the device is still around)4. Digging through logs

In this post I'll discuss all of them but will focus on the last one.

The Discovery

Let's start with the obvious one: Searching where it was lost.

My sister was pretty sure that she left her Pencil at the University. It was the obvious guess: she used it there the day before and it was the last time she saw it. After calling back and forth everybody around that class room she finally gave up and admitted the loss.

The next day she got some new hope and searched Google for a way to uncover the last location. And you guessed it - Google is full of Ads for Bluetooth tracking Apps which claim to be the best around. They will notify you on a potential leave behind of a tracked peripheral. Well, it's working as long as you're "planning ahead" but none of them will recover the last seen location of an already lost device. What a bummer.

Later that day she approached me with the sad news - so I did the same: Searching the web. My first hits were some geek-websites suggesting "walking around" and wait for a connection - expecting their audience not leaving their home and therefore losing stuff in a bluetooth range. The rest of the results was pretty much the same as already discussed earlier.

The Improvisation

As I worked in IT support / Systems Engineering quite some time - one of the first things that I tend to do when anything is broken or lost is checking the logs (or more broadly any of the three pillars). Usually you can just connect your iPhone to your Mac, trust the computer and use the Console.App to search trough the logs.

Well, getting the logs of an iPad OS running device that is not anywhere around (my sister is not that tech-savvy and living some hundred kilometers away) turned out to be more tricky than just plugging in and using the Mac Console. Luckily the Apple developer forums are not that extensive but clear enough to learn about sysdiagnose.

For my devices running the latest iOS 15 it was straight forward to trigger a sysdiagnose snapshot, holding all physical keys for a few milliseconds:

[Volume Up] + [Volume Down] + [Power]

It's really less than a second - don't expect any feedback (sometimes you hit a screenshot).

Then there is a crucial part for all of us impatient people: Wait a minute or two! You will be rewarded with an archive of significant size to explore. Find it in the privacy settings:

Notice the timestamp? - Click on the latest report and share it to your Mac - I had to use iCloud as AirDrop was no option due to the distance.

Once the archive is loaded to your Mac - unpack (double click does the Apple-Magic!). Let's have a look what's in there:

That's sweet - a bunch of diagnostic reports to dig through. Take your time and look around on your own. Spoiler: for our next step the logs folder won't be as helpful as I initially expected.

The Final Sprint

I probably did the same as you, clicking through all the folders and peeking into the files. After playing around with grep, the system_logs.logarchive surfaced as the most promissing with a lot of hits for the keyword Apple Pencil :

grep -irFe "Apple Pencil"

But how to read these binary files? Obviously you can load them into the Built-in Mac tool "Console" (it has nothing to do with the terminal, it's rather a tool to read the logs from the Mac or connected Apple Devices). For whatever reason the imported archive did not show up any Apple Pencil matches within the Console although grep found something.

Last but not least I found the log command which is also delivered out of the box and presumably the CLI for the Console:

Great! Let's put it together and find the lost Apple Pencil:

log show --archive system_logs.logarchive | grep -iF "Apple Pencil"

At this point look for pattern changes. You should see disconnect, power found and loss events - these events will hint you the last-seen-time.

Unfortunately this approach won't give you a location, but you will give a quite accurate timestamp when to look back for it. In the case of my sister we were able to combine the timestamp with her Google Maps Location History and pinpoint the loss-location. To both of our surprise it wasn't the University but hours later at a public parking lot.

Even if you don't have Google Location History you might be able to pinpoint the location, too - look at your chats, emails, calls & be creative! But if you do have Google Maps Location History - be curious about the upcoming posts where I will go through the Location History and use Google Takeout to get a better understanding what Google is saving about you.

Azure DevOps: Get short commit hash in build pipeline

Anton Huck — Tue, 27 Jul 2021 10:00:00 GMT

When building potentially deployable artifacts as part of the build pipeline it’s crutical to identify each produced artifact based on the source of the build. Helpful tracers I’m using are branch name, commit hash and the build number. In my current project I’m using Azure DevOps Pipelines for building and they provide a lot of helpful predefined variables (find a full list here). Anyway there is one variable I’m missing: short commit hash.

Of course Microsoft could provide this variable out of the box but I also get the point that requirements are highly different. While I prefer the 8-char long hash other might be happy with seven or ten characters. I will show you that it’s not that complicated to trim your own commit hash and share the snippet I’m using for almost every pipeline:

- powershell: |
    $shortHash = $env:BUILD_SOURCEVERSION.subString(0, 7)
    Write-Host "##vso[task.setvariable variable=shortHash]$shortHash"

In bash it’s even easier:

- powershell: |
    echo "##vso[task.setvariable variable=shortHash]${BUILD_SOURCEVERSION:0:7}"

And this is how you can use it in you pipeline later (remember that the template experession syntax ${{shortHash}} won’t work as the variable needs to be evaluated at runtime)

- task: DotNetCoreCLI@2
  displayName: Publish .NET Application
  inputs:
    command: 'publish'
    arguments: '--configuration 'Release' --output $(Build.ArtifactStagingDirectory) --version-suffix $(shortHash)-$(Build.BuildNumber)'