当前位置: 首页 > 后端技术 > Python

运行dbt(qbit)

时间:2023-03-26 12:49:19 Python

下诗前言dbt是DataBuildTool的缩写,目前由dbtlabs公司开发维护,该公司的前身是FishtownAnalytics。dbt主要处理ETL中的T(transform)dbt主要分为两部分dbtCore:cli命令行工具dbtCloud:云服务dbtCore的github地址为:https://github.com/dbt-labs/d...qbit个人认为dbtCore的官方入门教程不??友好。一开始就需要谷歌的BigQuery服务、GitHub账号等。事实上,dbtCore完全可以在不联网的情况下离线使用。本文主要参考:【大数据架构之旅】2从零开始学习dbt技术栈Windows10Python3.8.10poetry1.3.1git2.35.1.windows.2PostgreSQL15.1poetry项目配置.../test_dbt/pyproject.toml[tool.poetry]name="test-dbt"version="0.1.0"description=""authors=["qbit"]readme="README.md"packages=[{include="test_dbt"}][[tool.poetry.source]]name="aliyun"url="https://mirrors.aliyun.com/pypi/simple/"default=true[tool.poetry.dependencies]python="^3.8"本文中dbt-core="~1.3.1"dbt-postgres="~1.3.1"[build-system]requires=["poetry-core"]build-backend="poetry.core.masonry.api"诗歌项目文件夹指.../test_dbt/本文dbt项目文件夹指.../test_dbt/dbt_demo/创建数据库并初始化项目在PostgreSQL中新建数据库dbt-demo初始化dbt项目,在.../test_dbt/目录下执行如下命令poetryrundbtinitdbt_demo。按提示执行后.../test_dbt/目录结构如下$tree.├──dbt_demo│├──analyses│├──dbt_project.yml│├──宏│├──模型││└──示例││├──my_first_dbt_model.sql││├──my_second_dbt_model.sql││└──schema.yml│├──README.md│├──seeds│├──snapshots│└──tests├──logs│└──dbt.log├──poetry.lock└──pyproject.toml在个人用户目录下找到文件~/.dbt/profiles.yml,qbit的绝对路径为C:\Users\qbit\.dbt\profiles.yml,将以下内容复制粘贴到文件中,根据你的实际情况填写方括号内的内容dbt_demo:输出:开发:类型:postgres线程:1主机:[主机]端口:[端口]用户:[dev_username]密码:[dev_password]数据库名称:dbt_demo架构:dev_schema产品:类型:postgres线程:1主机:[主机]端口:[port]user:[prod_username]pass:[prod_password]dbname:dbt_demoschema:prod_schematarget:dev如无特殊说明,以下命令均在.../test_dbt/dbt_demo/目录下运行。进入.../test_dbt/dbt_demo/目录运行内置模型以下命令查看环境信息$poetryrundbtdebug06:41:09Runningwithdbt=1.3.1dbtversion:1.3.1pythonversion:3.8.10python路径:D:\Python3Project\test_dbt\.venv\Scripts\python.exeos信息:Windows-10-10.0.19045-SP0使用位于C:\Users\qbit\.dbt\profiles.yml的profiles.yml文件使用位于D:\Python3Project\test_dbt\dbt_demo\dbt_project的dbt_project.yml文件。ymlConfiguration:profiles.yml文件[OKfoundandvalid]dbt_project.ymlfile[OKfoundandvalid]Requireddependencies:-git[OKfound]Connection:host:192.168.1.52port:5432user:postgresdatabase:dbt_demoschema:dev_schemasearch_path:Nonekeepalives_idle:0sslmode:NoneConnectiontest:[OKconnectionok]Allcheckspassed!生成自带模型$poetryrundbtrun06:47:04Runningwithdbt=1.3.106:47:04未找到部分解析保存文件。开始完整解析。06:47:04找到2个模型、4个测试、0个快照、0个分析、289个宏、0个操作、0个种子文件、0个来源、0个曝光、0个指标06:47:0406:47:05并发:1个线程(target='dev')06:47:0506:47:051of2STARTsql表模型dev_schema.my_first_dbt_model.....................[RUN]06:47:051of2OK已创建sql选项卡lemodeldev_schema.my_first_dbt_model...............[SELECT2in0.15s]06:47:052of2STARTsqlviewmodeldev_schema.my_second_dbt_model...。..........[RUN]06:47:052of2OK创建了sql视图模型dev_schema.my_second_dbt_model.........[在0.09秒内创建视图]06:47:0506:47:05在0小时0分0.53秒(0.53秒)内完成运行1个表模型、1个视图模型。06:47:0506:47:05成功完成06:47:0506:47:05完成。PASS=2WARN=0ERROR=0SKIP=0TOTAL=2查看PostgreSQL数据库,应该可以看到名为my_first_dbt_model的表,以及创建外部数据的名为my_second_dbt_model的视图该表是从加州大学下载的共享单车数据集,Irvine,并将hour.csv和day.csv放在.../test_dbt/dbt_demo/seeds文件夹中在.../test_dbt/dbt_demo/seeds目录下创建文件bike_share.yml,内容如下:version:2seeds:-name:hourconfig:column_types:dteday:date-name:dayconfig:column_types:dteday:date在这里指定日期类型是为了防止dbt自动推断类型。数据列的类型可以参考官方文档column_types,执行如下命令导入基础数据$poetryrundbtseed07:19:26Runningwithdbt=1.3.107:19:26发现2个模型,4个测试、0个快照、0个分析、289个宏、0个操作、2个种子文件、0个来源、0个曝光、0个指标07:19:2607:19:26并发:1个线程(target='dev')07:19:2607:19:261of2启动种子文件dev_schema.day.....................[RUN]07:19:311of2OK载入种子文件dev_schema.day.....................................[INSERT731in4.54s]07:19:312of2启动种子文件dev_schema.hour...........................................................[RUN]07:21:242of2OK加载种子文件dev_schema.hour............................................[INSERT17379in112.83s]07:21:2407:21:24运行完毕0小时1分57.65秒(117.65秒)内有2个种子。07:21:2407:21:24成功完成07:21:2407:21:24完成。PASS=2WARN=0ERROR=0SKIP=0TOTAL=2检查PostgreSQL数据库,你应该看到已经创建了2个新表:houranddaycustommodelstocreateamodeldirectory.../test_dbt/dbt_demo/models/bike_share,下面sql文件中的ref('day')就是引用上面导入的day数据库表在.../models/bike_share目录下创建weather_trend.sql,内容如下/*天气趋势表*/{{config(materialized='table')}}withweather_trendas(selectweathersit,min(temp*47+8)作为min_temp,max(temp*47+8)作为max_temp,avg(temp*47+8)asmean_temp,sum(casual)astotal_casual,sum(registered)astotal_registered,sum(cnt)astotal_cntfrom{{ref('day')}}groupbyweathersit)在.../models/中创建season_trend.sqlbike_share目录,内容如下/*季节趋势表*/{{config(materialized='table')}}withseason_trendas(selectseason,min(temp*47+8)asmin_temp,max(temp*47+8)asmax_temp,avg(temp*47+8)asmean_temp,sum(casual)astotal_casual,sum(registered)astotal_registered,sum(cnt)astotal_cntfrom{{ref('day')}}groupbyseason)在.../models/bike_share目录下创建month_trend.sql,内容如下/*月度趋势表*/{{config(materialized='table')}}withmonth_trendas(选择mnth,min(temp*47+8)作为min_temp,max(temp*47+8)作为max_temp,avg(temp*47+8)作为mean_temp,sum(casual)作为total_casual,sum(registered)astotal_registered,sum(cnt)astotal_cntfrom{{ref('day')}}groupbymnth)在.../models/bike_share目录下创建weekday_trend.sql,内容如下/*星期几趋势表*/{{config(materialized='table')}}withweekday_trendas(selectweekday,min(temp*47+8)asmin_temp,max(temp*47+8)asmax_temp,avg(temp*47+8)asmean_temp,sum(casual)astotal_casual,sum(registered)astotal_registered,sum(cnt)astotal_cntfrom{{ref('day')}}groupbyweekday)执行以下命令生成模型$poetryrundbtrun07:43:59Runningwithdbt=1.3.107:43:59Found6models,4tests,0snapshots,0analyzes,289个宏,0个操作,2个种子文件,0个来源,0个曝光,0个指标07:43:5907:43:59并发:1threads(target='dev')07:43:5907:43:591of6STARTsql表模型dev_schema.month_trend..........................[RUN]07:44:001of6OKcreatedsqltablemodeldev_schema.month_trend.....................[SELECT12在0.15s]07:44:002of6STARTsql表模型dev_schema.my_first_dbt_model.....................[RUN]07:44:002of6OK创建了sql表模型dev_schema.my_first_dbt_model...............[SELECT2in0.09s]07:44:003of6STARTsql表模型dev_schema.season_trend........................[RUN]07:44:003of6OK创建了sql表模型dev_schema.season_trend.....................[SELECT4in0.07s]07:44:004of6STARTsqltablemodeldev_schema.weather_trend..........................................[RUN]07:44:004of6OK创建了sql表模型dev_schema.weather_trend........................[SELECT3in0.08s]07:44:005of6STARTsql表模型dev_schema.weekday_trend......................[运行]07:44:005of6OKcreatedsqltablemodeldev_schema.weekday_trend.....................[SELECT7in0.07s]07:44:006of6STARTsqlviewmodeldev_schema.my_second_dbt_model..........[RUN]07:44:006of6OKcreatedsqlviewmodeldev_schema.my_second_dbt_model...............[CREATEVIEWin0.07s]07:44:0007:44:00在0小时0分0.78秒(0.78秒)内完成运行5个表模型,1个视图模型。07:44:0007:44:00成功完成07:44:0007:44:00完成.PASS=6WARN=0ERROR=0SKIP=0TOTAL=6文档生成和查看运行以下命令生成文档$poetryrundbtdocsgenerate07:49:09Runningwithdbt=1.3.107:49:09Found6models,4个测试,0个快照,0个分析,289个宏,0个操作,2个种子文件,0个来源,0个曝光,0个指标07:49:0907:49:09并发:1个线程(target='dev')07:49:0907:49:09Done.07:49:10Buildingcatalog07:49:10CatalogwrittentoD:\Python3Project\test_dbt\dbt_demo\target\catalog.json运行如下命令,会启动http服务,并打开查看文档的默认浏览器$poetryrundbtdocsserve07:49:53Runningwithdbt=1.3.107:49:53Servingdocsat0.0.0.0:808007:49:53要从浏览器访问,请导航至:http://localhost:808007:49:5307:49:5307:49:53按Ctrl+C退出。127.0.0.1--[28/Dec/202215:49:54]"GET/HTTP/1.1"200-127.0.0.1--[28/2022年12月15:49:54]"GET/manifest.json?cb=1672213794801HTTP/1.1"200-127.0.0.1--[28/Dec/202215:49:54]"GET/catalog.json?cb=1672213794801HTTP/1.1"200-点击web界面右下按钮查看LineageGraph的目录结构最后看一下目录结构$tree./dbt_demo/-L3./dbt_demo/├──分析├──dbt_packages├──dbt_project.yml├──日志│└──dbt.log├──宏├──模型│├──bike_share││├──month_trend.sql││├──season_trend.sql││├──weather_trend.sql││└──weekday_trend.sql│└──例子│├──my_first_dbt_model.sql│├──my_second_dbt_model.sql│└──schema.yml├──README.md├──种子│├──bike_share.yml│├──day.csv│└──hour.csv├──快照├──目标│├──catalog.json│├──已编译││└──dbt_demo│├──graph.gpickle│├──index.html│├──manifest.json│├──partial_parse.msgpack│├──run││└──dbt_demo│└──run_results.json└──tests本文来自qbitsnap