当前位置: 首页 > 科技观察

说说SQLServer行不能跨页的问题

时间:2023-03-20 22:35:06 科技观察

一:背景1.上一篇写完故事后,马上有朋友留言对记录行的8060byte限制的疑惑,因为他的tablerecordstorage文章数量较多,用于存储文章的字段类型为nvarchar(max),长度明显超过8060byte。底层是如何突破8060byte限制的?说实话,这是个好问题,本质上是说不能突破8060字节的限制。如果我要处理的话,文章的数据肯定是分布在多个数据页上的。我是这么想的吗?让我们来看看。二:观察大字段数据的布局1.对nvarchar(max)的理解玩过sqlserver的朋友都知道,新一代的sqlserver已经将早期的text和nvarchar(max)换成了varchar(max)和nvarchar(max)ntext,理论上这个类型最多可以存储2-1的31次方,也就是2G左右,然后我们像nvarchar(max)一样插入1w个字符,大概20k的数据,如果取整的话应该用3测试代码如下:USEMyTestDBGOCREATETABLEt7(aINTIDENTITY,bNVARCHAR(MAX))GOINSERTINTOt7VALUES(REPLICATE(CAST('x'ASNVARCHAR(max)),10000))SELECTLEN(b)FROMt7;DBCCTRACEON(3604)DBCCIND(MyTestDB,t7,-1)从图中可以看出居然有4个数据页,很奇怪。快速看一下,一个是In-rowdata,也就是行内数据,是一种常见的数据页,第三个是LOB数据,即大值数据(LargeObjectData),它是一个特殊的LOB数据页,看来这1wx应该分给这3个LOBdata数据页,所以我们可以用DBCCPAGE把这4个数据页的内容导出看看。PAGE:(1:464)Page@0x00000175CBB46000m_pageId=(1:464)m_headerVersion=1m_type=1m_typeFlagBits=0x0m_level=0m_flagBits=0x8000m_objId(AllocUnitId.idObj)=202m_indexId(AllocUnitId.idInd)=256Metadata:AllocUnitId=72057594051166208元数据:PartitionId=72057594044022784元数据:IndexId=0元数据:ObjectId=1637580872m_prevPage=(0:0)m_nextPage=(0:0)pminlen=8m_slotCnt=1m_freeCnt=8031m_freeData=159m_reservedCnt=(0m_ls29))m_xactReserved=0m_xdesId=(0:0)m_ghostRecCnt=0m_tornBits=0DBFragID=1DATA:000000482E3F8000:0101000000800001000000000000080000000000.....................000000482E3F8014:00000100ca0000005f1f9f00d001000001000000.........................000000482E3F808C:01000001000000204e0000c801000001.................000000482E3F80A0:0000780078007800780078007800780078007800..x.x.x.x.x.x.x.x.x.000000482E3F80B4:7800780078007800780078007800780078007800x.x.x.x.x.x.x.x.x.x....000000482E3F9FCC:7800780078007800780000002121212121212121x.x.x.x.x...!!!!!!!!000000482E3F9FE0:2121212121212121212121212121212121212121!!!!!!!!!!!!!!!!!!!!!!!!!!000000482E3F9FF4:212121212121212121216000!!!!!!!!!!!.OFFSETTABLE:Row-Offset0(0x0)-96(0x60)页面:(1:456)数据:内存转储@0x0000000048355f8000000000000000000000000000483A478000:0103000000800001000000000000000000000000000000000000000000000000000000000000000000000000000000000000来入.........@............000000483A478028:26000000780b0000240000000000000000000000&...x...$......000000483A47803C:0000000001000000000000000000000000000000.....................000000483A478050:0000000000000000000000000000000000000000005E0F..................................................................................................0000483A478064:0000F3060000000003007800780078007800780078007800.....................................................................62626262.x.x.x.x.x...bbbbbbb000000483A479FB8:6262626262626262626262626262626262626262bbbbbbbbbbbbbbbbbbbb000000483A479FCC:6262626262626262626262626202000000002121bbbbbbbbbbbbb.....!!000000483A479FE0:2121212121212121212121212121212121212121!!!!!!!!!!!!!!!!!!!!0000483A479FF4:212121212121212121216000!!!!!!!!!!!!!!.........000000483BA78014:00000100cb0000002800d61fc901000001000000........000000482EDF8050:0000000000000000000000000000000000000800761F.................................................................................................................................................0000482EDF8064:0000F3060000000003007800780078007800780078007800.....................................................................!!!!!!!!!!!!!!!!!!000000483BA79FF4:212121212121212121216000!!!!!!!!!转储@0x000000483BA78000。..000000483BA78050:0000000000000000000000000000000000000000000000000000000000000000000000000000000000来!!!000000483BA79FE0:2121212121212121212121212121212121212121!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!相信很多朋友都很奇怪,为什么第464个数据页里面有很多x。其实这些x都算垃圾数据,从m_freeCnt=8031可以知道。这个字段表示当前数据页的空闲空间,所以这1w个x被LOB数据页吃掉了,和推测一致文章的开头这也算是解决了朋友的问题,但是如果你想打破砂锅问底线,你肯定想知道这4个数据页在内存中是如何组织的,或者说是如何串联起来的?接下来,我们好好聊聊吧。2.4个数据页是如何组织的?观察464号数据页和LOB数据页有什么关系?这是对基础知识的考验。在真正的行数据之前,记录了一个FID:PID:SID的内存存储,即:文件ID:数据页ID:slotID,可以用WinDbg观察到。0:125>dp000000482E3F8000+0x60+0x700000048`2e3f8067803f0001`7800020000000001`3500000400000048`2e3f807700001f68`000006f300000001`000001c900000048`2e3f8087000001ca`00003ed000004e20`0000000100000048`2e3f809700000001`000001c878007800`7800000000000048`2e3f80a778007800`7800780078007800`7800780000000048`2e3f80b778007800`7800780078007800`7800780000000048`2e3f80c778007800`7800780078007800`7800780000000048`2e3f80d778007800`7800780078007800`78007800简单解释一下:000000482E3F8000?是数据页在内存中的首地址,000000482E3F8000+0x60?是数据页内第一一条记录的地址,加上+0x7是为了内存地址对齐。仔细观察内存地址000000482e3f8097上的内容是00000001000001c8,对应SID(2byte),FID(2byte),PID(4byte),那么PID=0x000001c8是什么?可以用WinDbg计算出是第456个数据页。0:125>?0x1c8Evaluateexpression:456=00000000`000001c8按照这个理论继续看内存地址,会发现00000001000001c9和00000001000001ca,分别对应数据页457和458。至此,脑海中浮现出一幅画面,如下所示。三:总结经过本文的分析,你知道SQLSERVER会使用特殊的LOB数据页来存储这些大字段。由于数据被拆分到多个数据页,这给select操作增加了更多的逻辑,同时也会导致C++代码多次在LOB数据页上行走,给查询性能增加了巨大的开销。例如,以下SQL查询。SETSTATISTICSIOONSELECT*FROMt7;SETSTATISTICSIOOFF可以发现在LOB数据页上跑了7次,加上2次数据观察。INSERTINTOt7VALUES(REPLICATE(CAST('y'ASNVARCHAR(max)),10000))INSERTINTOt7VALUES(REPLICATE(CAST('z'ASNVARCHAR(max)),10000))SETSTATISTICSIOONSELECT*FROMt7;SETSTATISTICSIOOFF这次从7次变成了23次。一般来说,尽量不要在数据库中存储大字段。

最新推荐
猜你喜欢