{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/btree","result":{"data":{"markdownRemark":{"id":"a009df2b-4072-534e-991e-46567eb6cb64","html":"<h4 id=\"preface\" style=\"position:relative;\"><a href=\"#preface\" aria-label=\"preface permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Preface</h4>\n<p>This is the second article of a series to summarize the key concepts of Chapter 3. Storage and Retrieval in the <a href=\"https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Designing Data Intensive Application</a> book. The series consists of 3 articles, including log-structure storage engine (SSTables and LSM Tree), page-oriented storage engine (B Tree) and column based storage engine.</p>\n<h4 id=\"landscape-of-database-storages\" style=\"position:relative;\"><a href=\"#landscape-of-database-storages\" aria-label=\"landscape of database storages permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Landscape of database storages</h4>\n<figure>\n\t<span class=\"gatsby-resp-image-wrapper\" style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \">\n      <a class=\"gatsby-resp-image-link\" href=\"/static/b0453fbb95c32586f445dca4f9795c72/3e992/storage-engine-tree.png\" style=\"display: block\" target=\"_blank\" rel=\"noopener\">\n    <span class=\"gatsby-resp-image-background-image\" style=\"padding-bottom: 46.25%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAACXBIWXMAAAsSAAALEgHS3X78AAAA+ElEQVQoz32SiYrFMAhF8/+/WCil0H3f9zpzhEDa92aEiyHR69Vo5Nfu+xbrXfxl0zRJFEXi+760bfvINW8yt8B5nrIsi+z7rmf8dV0yDIOUZamk4zg+ipm3Gns+jkOqqlIV+KIolKSua/VJkojnedL3vcbO8yzruopxSVBDMgkEUn3bNgXBvHPmnrg0TaXrOsmyTIIgUG/c9khyQeVvRixEkKIMUEQVooZ2UMSFNcjsnJqm0bkBkmkbNWEY6t1jhrYNC3eW/Gae55pEHKAoxWmX+XJ+/LJL4BID1JAA0XuteLcb8LE2FhDQHrvFjOI41mGj8j9zt+QHDZm8w9QVNO4AAAAASUVORK5CYII=&apos;); background-size: cover; display: block;\"></span>\n  <picture>\n        <source srcset=\"/static/b0453fbb95c32586f445dca4f9795c72/8ac56/storage-engine-tree.webp 240w,\n/static/b0453fbb95c32586f445dca4f9795c72/d3be9/storage-engine-tree.webp 480w,\n/static/b0453fbb95c32586f445dca4f9795c72/e46b2/storage-engine-tree.webp 960w,\n/static/b0453fbb95c32586f445dca4f9795c72/f992d/storage-engine-tree.webp 1440w,\n/static/b0453fbb95c32586f445dca4f9795c72/97599/storage-engine-tree.webp 1902w\" sizes=\"(max-width: 960px) 100vw, 960px\" type=\"image/webp\">\n        <source srcset=\"/static/b0453fbb95c32586f445dca4f9795c72/8ff5a/storage-engine-tree.png 240w,\n/static/b0453fbb95c32586f445dca4f9795c72/e85cb/storage-engine-tree.png 480w,\n/static/b0453fbb95c32586f445dca4f9795c72/d9199/storage-engine-tree.png 960w,\n/static/b0453fbb95c32586f445dca4f9795c72/07a9c/storage-engine-tree.png 1440w,\n/static/b0453fbb95c32586f445dca4f9795c72/3e992/storage-engine-tree.png 1902w\" sizes=\"(max-width: 960px) 100vw, 960px\" type=\"image/png\">\n        <img class=\"gatsby-resp-image-image\" src=\"/static/b0453fbb95c32586f445dca4f9795c72/d9199/storage-engine-tree.png\" alt=\"Landscape of database storages\" title=\"Landscape of database storages\" loading=\"lazy\" style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\">\n      </picture>\n  </a>\n    </span>\n</figure>\n<p>In short, database storages are divided into two categories, row-based and column based. We will talk more about the difference of them in next article. In this article, we will talk more about the page-oriented storage under row-based storage.</p>\n<h2 id=\"page-oriented-storage-engine\" style=\"position:relative;\"><a href=\"#page-oriented-storage-engine\" aria-label=\"page oriented storage engine permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Page-oriented storage engine</h2>\n<p>Almost log-structured storage engine have pro-found influence on database evolution, the most widely adopted engine is actually page-oriented storage engine.</p>\n<p>Different from log-structure storage engine, page oriented storage engine <strong>update the value in place</strong> (in its original page), instead of appending data to latest segment file.</p>\n<p>B Tree, which were adopted by MySQL, PostgreSQL and MongoDB, are the most adopted data structure.</p>\n<h3 id=\"b-tree\" style=\"position:relative;\"><a href=\"#b-tree\" aria-label=\"b tree permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>B Tree</h3>\n<figure>\n\t<span class=\"gatsby-resp-image-wrapper\" style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; \">\n      <a class=\"gatsby-resp-image-link\" href=\"/static/aceba302c4d7fedfdc314c15790959b5/11d70/b-tree.png\" style=\"display: block\" target=\"_blank\" rel=\"noopener\">\n    <span class=\"gatsby-resp-image-background-image\" style=\"padding-bottom: 35.833333333333336%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAHCAYAAAAIy204AAAACXBIWXMAAAsSAAALEgHS3X78AAAAy0lEQVQoz3VRawuEMAzb//+J4sDHFMQvPvFt7lLIMYUrhG5dSJvO4Rv3fUP5OA6c54nrujBNE0II6LrO3vZ9t7q4QhwuLpLsvUdd18jzHGVZom1bZFmGpmmQpqndY7E3HF7B4rquvymF2AHxL9y2bUagHYJCy7I8GlCQPGZyuAo2lStxbMIkScxiURRmbRgGjONo0JkCXIVsk9/3PeZ5fsAE2VFTUoCfwEfWOQVz/GnkKuhG0GoeO5Sl989JrKoqa8hJOSVXwwZaF88fDGQhy2F2KHAAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;\"></span>\n  <picture>\n        <source srcset=\"/static/aceba302c4d7fedfdc314c15790959b5/8ac56/b-tree.webp 240w,\n/static/aceba302c4d7fedfdc314c15790959b5/d3be9/b-tree.webp 480w,\n/static/aceba302c4d7fedfdc314c15790959b5/e46b2/b-tree.webp 960w,\n/static/aceba302c4d7fedfdc314c15790959b5/f992d/b-tree.webp 1440w,\n/static/aceba302c4d7fedfdc314c15790959b5/d1be1/b-tree.webp 1802w\" sizes=\"(max-width: 960px) 100vw, 960px\" type=\"image/webp\">\n        <source srcset=\"/static/aceba302c4d7fedfdc314c15790959b5/8ff5a/b-tree.png 240w,\n/static/aceba302c4d7fedfdc314c15790959b5/e85cb/b-tree.png 480w,\n/static/aceba302c4d7fedfdc314c15790959b5/d9199/b-tree.png 960w,\n/static/aceba302c4d7fedfdc314c15790959b5/07a9c/b-tree.png 1440w,\n/static/aceba302c4d7fedfdc314c15790959b5/11d70/b-tree.png 1802w\" sizes=\"(max-width: 960px) 100vw, 960px\" type=\"image/png\">\n        <img class=\"gatsby-resp-image-image\" src=\"/static/aceba302c4d7fedfdc314c15790959b5/d9199/b-tree.png\" alt=\"Landscape of database storages\" title=\"Landscape of database storages\" loading=\"lazy\" style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\">\n      </picture>\n  </a>\n    </span>\n</figure>\n<h4 id=\"how-it-works\" style=\"position:relative;\"><a href=\"#how-it-works\" aria-label=\"how it works permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>How it works</h4>\n<ul>\n<li>\n<p>For read request:</p>\n<ul>\n<li>Binary search for root page</li>\n<li>Go to corresponding child page if necessary or get the value of corresponding key directly</li>\n</ul>\n</li>\n<li>\n<p>For write request:</p>\n<ul>\n<li>Find corresponding node first</li>\n<li>If found, update the value in that page directly</li>\n<li>If not found, insert a node in the page.</li>\n<li>If the page is full, split page into 2 pages, and equally distribute the nodes</li>\n</ul>\n</li>\n<li>\n<p>Branching factor: the number of nodes in one page, usually it’s several hundreds.</p>\n<ul>\n<li>Since B Tree is a balanced tree, the depth of tree is O(log n), where n is the number of key-value pairs.</li>\n<li>A four-level tree of 4KB pages with a branching factor of 500 can store up to 256 TB.</li>\n</ul>\n</li>\n</ul>\n<h4 id=\"pros--cons\" style=\"position:relative;\"><a href=\"#pros--cons\" aria-label=\"pros  cons permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Pros &#x26; Cons</h4>\n<ul>\n<li>Better read performance: because we are searching a N-ary search tree, the time complexity is O(log n). Where N is branching factor. On the other hands, the worst case of log-structured storage engine is nearly O(n).</li>\n<li>Worse write performance: write request in log-structured storage engine is to intuitively append data, but in B Tree, we need to search key first, open the target page, update value, and then save the page back.</li>\n<li>More complex concurrency control &#x26; crash recovery.</li>\n</ul>\n<h4 id=\"comparison-matrix-of-log-structure--page-oriented-storage-engine\" style=\"position:relative;\"><a href=\"#comparison-matrix-of-log-structure--page-oriented-storage-engine\" aria-label=\"comparison matrix of log structure  page oriented storage engine permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Comparison matrix of Log-structure &#x26; Page-oriented Storage engine</h4>\n<table>\n<thead>\n<tr>\n<th>Category</th>\n<th>Pros &#x26; Cons</th>\n<th>Well know data structure</th>\n<th>Real world DB</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Log-structure</td>\n<td>Better write performance</td>\n<td>SSTables &#x26; LSM Tree</td>\n<td>Cassandra  LevelDB  HBase Lucence</td>\n</tr>\n<tr>\n<td>Page-oriented</td>\n<td>Better read performance more complex concurrency control &#x26; crash recovery</td>\n<td>B Tree</td>\n<td>MySQL PostgreSQL MongoDB</td>\n</tr>\n</tbody>\n</table>\n<h4 id=\"about-write-performance\" style=\"position:relative;\"><a href=\"#about-write-performance\" aria-label=\"about write performance permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>About write performance</h4>\n<ul>\n<li>\n<p>Write Amplification: one write query can produce how many disk writes ?</p>\n<ul>\n<li>For write-intensive application, disk I/O is usually the bottleneck. Usually log-structured storage engine will have less write amplification, so its write performance is usually better.</li>\n</ul>\n</li>\n<li>\n<p>Log structure engines have higher write latency at high percentiles:</p>\n<ul>\n<li>although log structure storage engines usually have better write performance, however, they are sometimes largely influenced by background compaction process disk I/O. So log structure storage engine usually have higher write latency at higher percentile.</li>\n<li>On the other hands, the write latency of page oriented storage engine is more stable.</li>\n</ul>\n</li>\n</ul>\n<h3 id=\"summary\" style=\"position:relative;\"><a href=\"#summary\" aria-label=\"summary permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Summary</h3>\n<p>In this article, we walk through the design of page-oriented storage engine, mainly B Tree. We can know that data was updated directly in page file, so in general we can have worse write performance. But Since it’s a balanced tree structure, it can have much better read performance.</p>\n<p>So in short, if the scenario requires much more read than write, we can consider use page-oriented storage engine. On the other hands, if we to handle a write intensive application, we can consider using log-structured storage engine.</p>\n<p>In next article, we will talk more about column-based storage engine, which has many different design philosophy comparing to row-based storage engine. Let’s go.</p>","fields":{"slug":"/posts/btree","tagSlugs":["/tag/database-storage/","/tag/b-tree/","/tag/page-oriented-storage-engine/","/tag/designing-data-intensive-application/"]},"frontmatter":{"date":"2021-12-18T23:12:04.772Z","description":"B Tree is one of the most famous data structure of page-oriented storage engine (e.g. MySQL). page-oriented storage engine has relatively good read performance comparing to log-structured storage engine. Let's take a look at B Tree!","tags":["Database Storage","B Tree","Page oriented storage engine","Designing Data Intensive Application"],"title":"Storage and Retrieval (2) - B Tree","socialImage":{"publicURL":"/static/e1c2867b251a4f300016b135407f731f/social.jpg"}}}},"pageContext":{"slug":"/posts/btree"}},"staticQueryHashes":["251939775","401334301","825871152"]}