SnowFlake In 5 Minutes

Cloud Data Warehouses are a thing now!

Sowmya Mupparaju
Geek Culture

--

Photo by Darius Cotoi on Unsplash

Getting Started:

The Snowflake data warehouse is a cloud-based platform that stores and retrieves data. To begin learning about SnowFlake, let’s understand why companies need a warehouse. In the outside world, what is a warehouse? A space designed for storing and handling goods and raw materials efficiently. Likewise, traditional data warehouses are used to store large collections of data used to make business decisions. However, when we refer to SnowFlake’s data warehouse, we are often referring to the computing power that enables the processing of the data, not the storage of the data.

Introduction to SnowFlake:

Snowflake is a leading cloud-based platform that is provided as SaaS ( Software as a Service). There is no hardware (virtual or physical) to select, install, configure, or manage. There is virtually no software to install, configure, or manage. Snowflake does everything for you.

So what makes Snowflake so unique? One Word Architecture. Snowflake’s unique architecture is a hybrid of shared-disk(uses central repo for data that is accessed from all nodes) and shared-nothing( Queries uses MPP) architectures and consists of three key layers: The documentation for each is provided detailedly by Snowflake.

From SnowFlake

SnowFlake Key Features:

The legacy data warehouses and other on-premise data warehouses, are not scalable, require high maintenance, and require expensive support and licensing. Additionally, there is also a risk of data loss with on-perm solutions, which can lead to several problems. You get the best of both worlds with Snowflake — you can scale and be effective at the same time. Apart from Snowflake’s scalability and cloud-native elasticity, another feature worth mentioning is its ability to handle both structured and Semi-structured data.

Why Choose SnowFlake?

What does SnowFlake have to offer over a traditional warehouse?

Does it Provide Security, Governance?: Yes, it supports standard user authentication, MFA, OAuth and Single Sign-On. It also provides data encryption, column-level security, object-level access control.

Data Protection: SnowFlake Fail-Safe feature ensures historical data is protected in the event of a system crash or a breach. Snowflake Time Travel feature enables accessing historical data (i.e. data that has been changed or deleted) at any point within a defined period(1 day is standard for all accounts).

Standard SQL Support & Connectivity: The market has seen several databases with high performance, but the majority do not offer full relational SQL support such as union queries, Cross-schema views etc. SnowFlake was built from the start to support a majority of SQL functionalities( Most DDL and DML, Transient & Temporary Tables, Lateral & Materialized Views, Aggregate & Windows Functions, UDF, Stored procedures and Recursive CTE.

Connectivity: There are an extensive set of drivers or connectors supported by SnowFlake( Python, Spark, Node.JS, .NET, JDBC, ODBBC etc) and it also supports BI Tools.

Your Choice of Data: Snowflake supports loading any data that uses a supported character encoding, from compressed files, loads most flat, delimited data files (CSV, TSV, etc.), loads data files in JSON, Avro, ORC, Parquet, and XML format, Loads from S3 data sources. For continuous bulk loading data from files: Use SNOWPIPE to load data in micro-batches from internal (i.e. Snowflake) stages or external (Amazon S3, Google Cloud Storage, or Microsoft Azure) stages.

✓ Management: Snowflake needs zero management.

Pay as you go: Snowflake Billing, customers do not pay for “off” times when virtual warehouses are idle, and they can suspend and resume warehouses as they see fit

CDC: Any warehouse needs change data capture, especially for larger tables, so that only the changed records are retrieved, not the entire table. Snowflake Streams captures inserts and other DML changes made to a table

Results Caching: Upon executing a query, the result is cached for the current duration (currently 24 hours). Results are purged as soon as the time frame expires.

SnowFlake Continuos Data Pipelines:

From cloud warehouse to secure data sharing and beyond. Check here

From SnowFlake Docs

Continuous Data Loading: Continuous data loading options include:

  • Snowpipe: Loads micro-batches of data continuously from an external staging location(S3, Azure, GCP) into a staging table.
  • Snowflake Connector for Kafka: A Kafka connector continuously loads records from an Apache Kafka topic into an internal staging stage (Snowflake), and then into staging tables using Snowpipe.
  • Third-party data integration tools

Change Data Capture: One or more table streams capture change data and. make it available to query.

Recurring Tasks: To process the change data, SnowFlake Tasks execute SQL statements (which could call stored procedures) and move optimized data sets to destination tables for analysis. When this transformation process runs, it selects the change data in the stream so it can perform DML operations on the destination tables and then consumes the change data when the transaction is committed.

Final Thoughts:

Overall, Snowflake offers outstanding performance, parallel processing, multi-clustered and utmost productive warehouse. Companies can start small and grow large warehouses, thanks to the snowflake architecture. Multi clustering and scaling policies are unbeatable features and warehouses have Auto Suspend settings to suspend them when not in use. In many traditional warehouses, Data Marts may not have access to all of a company’s data and may lead to inaccuracies due to data replication. Snowflake, however, has access to all of its data at all times. Depending on the compute power needed, users can scale up ( Small to Medium) and down ( X-Large to Large ) the warehouse manually, however snapping back (Max Clusters to Min Clusters) and scaling out (Min Clusters to Max Clusters) take place automatically. Snowflake can be an effective solution for any business.

Checkout Free Online Self Paced Course: SnowFlake Hands On Essentials

Resources:

--

--

Sowmya Mupparaju
Geek Culture

Senior Software Engineer @Apple | A Farmer's Daughter I Columbia MBA