BQFBD – Overview

  BigQuery, BigQuery for Big Data Engineers

Main Menu

Section 1: Intro to GCP and its services

 

Section 2: Intro to BigQuery

8. Conventional Data Warehouse Problems

9. What is BigQuery

BigQuery is a fully managed, serverless, highly scalable and cost-effective cloud Data Warehouse designed for business agility.

  • Both Batch and Streaming data ingestion
    • Can store 100,000 rows per second
    • TB of batch data per second
  • Supports AI and ML
    • BigQuery ML
    • Integration with the AI Platform
      • Prediction and TensorFlow
  • Full managed
  • Scalability
  • Pay as you go
    • Pay separately for storage and compute
    • Pay for bytes that your query processes
    • Results cached, so no need to pay for same query 2x
  • Automated data transfer
    • Fully managed data transfer
    • Transfer from Teradata and S3 to BigQuery
  • Access control
    • Use IAM
    • Assign read-write, running jobs, etc. per project.

10. OOB Features

https://www.udemy.com/course/bigquery/learn/lecture/22717593#overview

  • BQ GIS
    • Geographic Information System
    • Obtain insights from geographic data points using Long/Lat
  • Auto Backup
    • 7 days
  • Integration with other GCP
    • DataProc
  • Foundation for BI
    • Seamless integration, transformation, analysis, visualization
  • Programmatic Interaction
    • REST API
    • Libraries in Java, Python, Node.js, c#, Go, Ruby and PHP
  • Security
    • At rest and transit
    • Each data block encrypted with different keys
  • Logging, Monitoring and alerting
    • Cloud Audit Logs
  • Federated queries
    • Process data in Object Storage
      • Parquet, ORC, Open source
    • Process transactional databases
      • BigTable, Cloud SQL, spreadsheets in Drive
        • You can pull data directly from a CSV file…
  • Data Science Workloads
    • Spark, TensorFlow, scikit-learn
    • No need to have multiple copies of the same data
  • Powerful data repository

11. Architecture of BigQuery

https://www.udemy.com/course/bigquery/learn/lecture/22717627#overview

  • Engine – Dremel
    • Combination of columnar data layouts and tree architecture
  • File system – Colossus
    • Columnar storage, Google’s distributed filesystem

 

 

 

LEAVE A COMMENT