<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael</title>
    <description>The latest articles on DEV Community by Michael (@michaelfv).</description>
    <link>https://dev.clauneck.workers.dev/michaelfv</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3830930%2F34d5c2f8-f162-4df3-865b-34a96a64ac17.png</url>
      <title>DEV Community: Michael</title>
      <link>https://dev.clauneck.workers.dev/michaelfv</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.clauneck.workers.dev/feed/michaelfv"/>
    <language>en</language>
    <item>
      <title>GBase 8a Cluster Installation in Practice: From Environment Setup to Health Checks</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 23 Jun 2026 15:44:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-cluster-installation-in-practice-from-environment-setup-to-health-checks-3fia</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-cluster-installation-in-practice-from-environment-setup-to-health-checks-3fia</guid>
      <description>&lt;p&gt;The success of a GBase 8a cluster installation often hinges not on the install commands themselves, but on the pre‑installation environment preparation and post‑installation validation. This guide focuses on critical prerequisites — networking, SSH, system limits, and firewall settings — and walks through a verified workflow for verifying cluster state and distribution configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Node Planning and Component Roles
&lt;/h2&gt;

&lt;p&gt;A GBase 8a cluster consists of three component types: gcware (management nodes, 3 or 5 recommended), gcluster (coordinators), and gnode (data nodes). Plan the roles of each node before starting. Here is a sample 3‑node layout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;198.51.100.21  management + coordinator + data node
198.51.100.22  management + coordinator + data node
198.51.100.23  management + coordinator + data node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. System Prerequisites to Address Before Installation
&lt;/h2&gt;

&lt;p&gt;Nodes must use static IPs, have full network connectivity between them, and have hostname resolution properly configured.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network and SSH Checks
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Node connectivity&lt;/span&gt;
ping 198.51.100.22
ping 198.51.100.23

&lt;span class="c"&gt;# SSH connectivity&lt;/span&gt;
ssh root@198.51.100.22
ssh root@198.51.100.23
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Firewall and SELinux Checks
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status firewalld
sestatus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Handling Non‑Default SSH Ports
&lt;/h2&gt;

&lt;p&gt;If SSH is not running on port 22, specify the custom port either through user‑level SSH configuration or the installation options file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: User‑level SSH config&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.ssh/config
Host 198.51.100.22 198.51.100.23
    Port 22022
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B: Install options file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# install.options
&lt;/span&gt;&lt;span class="py"&gt;sshPort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;22022&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Adjust ulimit and systemd Limits Early
&lt;/h2&gt;

&lt;p&gt;Insufficient file handles or process limits will cause instability under concurrency and batch workloads. Address this across systemd, profile, and limits simultaneously.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/systemd/system.conf&lt;/span&gt;
&lt;span class="nv"&gt;DefaultLimitNOFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;655350
&lt;span class="nv"&gt;DefaultLimitNPROC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;655350
systemctl daemon-reexec
systemctl restart sshd

&lt;span class="c"&gt;# Also update /etc/profile and /etc/security/limits.conf with appropriate nofile settings&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Key Installation Steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create the operating system user and directories&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;useradd gbaseadm
passwd gbaseadm
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /data/gbase8a
&lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; gbaseadm:gbaseadm /data/gbase8a
&lt;span class="nb"&gt;chown &lt;/span&gt;gbaseadm:gbaseadm /tmp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extract the package and run the environment setup script&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /data
&lt;span class="nb"&gt;tar &lt;/span&gt;xfj GBase8a_MPP_Cluster-NoLicense-FREE-9.5.3-demo-redhat7-x86_64.tar.bz2
python SetSysEnv.py &lt;span class="nt"&gt;--dbaUser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbaseadm &lt;span class="nt"&gt;--installPrefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/data/gbase8a &lt;span class="nt"&gt;--cgroup&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Write the installation configuration file &lt;code&gt;install.options&lt;/code&gt;&lt;/strong&gt;, specifying the install directory, coordinator hosts, data hosts, management hosts, user credentials, and SSH port.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run the silent installation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./gcinstall.py &lt;span class="nt"&gt;--silent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;install.options
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Validate Cluster Health Immediately After Installation
&lt;/h2&gt;

&lt;p&gt;A completed install script does not guarantee a healthy cluster. Always run &lt;code&gt;gcadmin&lt;/code&gt; to verify that the cluster state is &lt;code&gt;ACTIVE&lt;/code&gt; and that all gcware, coordinator, and data node roles show &lt;code&gt;OPEN&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Configure and Verify Distribution Settings
&lt;/h2&gt;

&lt;p&gt;Prepare a distribution XML file, apply it with &lt;code&gt;gcadmin distribution&lt;/code&gt;, and inspect the result with &lt;code&gt;gcadmin showdistribution node&lt;/code&gt;. This step directly determines how data is placed across nodes and how the load is balanced.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Parameter Tuning Recommendations
&lt;/h2&gt;

&lt;p&gt;Avoid changing many parameters at once. Prioritise based on symptom category: for connection and timeout issues, check &lt;code&gt;max_connections&lt;/code&gt; and &lt;code&gt;connect_timeout&lt;/code&gt; first; for concurrency and thread pool pressure, look at &lt;code&gt;gbase_parallel_degree&lt;/code&gt;; for loading bottlenecks, examine &lt;code&gt;gcluster_loader_max_data_processors&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A smooth GBase 8a installation relies on getting the basics right before running any installer. When the network, SSH, system limits, and cluster health checks are all solid, the rest of the &lt;strong&gt;gbase database&lt;/strong&gt; operations become far more predictable.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>operations</category>
    </item>
    <item>
      <title>GBase 8a Performance Troubleshooting and Stability Governance: From Slow Queries to Primary-Replica Consistency</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 23 Jun 2026 14:39:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-performance-troubleshooting-and-stability-governance-from-slow-queries-to-primary-replica-5g5i</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-performance-troubleshooting-and-stability-governance-from-slow-queries-to-primary-replica-5g5i</guid>
      <description>&lt;p&gt;Performance and stability issues in a GBase 8a cluster rarely exist in isolation. A query that suddenly slows down, wildly uneven node execution times, unstable results, or even local replica inconsistency can stem from node‑level execution differences, data skew, intermediate result bloat, underlying environment anomalies, or primary‑replica consistency problems. This article integrates node‑level slow‑query diagnosis, data skew detection, intermediate result control, log and audit correlation, and primary‑replica consistency handling into a systematic troubleshooting workflow for your &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Slow Query Diagnosis: Pinpoint the Node First
&lt;/h2&gt;

&lt;p&gt;Before rewriting SQL, determine whether the slowdown is cluster‑wide or isolated to a few nodes. Enable recording of queries that exceed a threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;GLOBAL&lt;/span&gt; &lt;span class="n"&gt;gcluster_dql_statistic_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- record queries over 3 seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retrieve recently recorded slow queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sys_sqls&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drill into per‑node execution times for a specific SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sys_sql_elapsepernode&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sql_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'actual_sql_id'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If only one or two nodes lag significantly while others are fast, investigate that node's resources, data distribution, or local failure. If all nodes are uniformly slow, examine the SQL logic, intermediate result size, or global parameter settings.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Uneven Node Load: Suspect Data Skew First
&lt;/h2&gt;

&lt;p&gt;When node execution times differ drastically, data skew is usually the prime suspect. A query like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;region_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_order&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;region_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;can cause a few nodes to shoulder most of the work if &lt;code&gt;region_code&lt;/code&gt; is heavily imbalanced. Diagnosis order: confirm skew → determine if the table's distribution key is flawed or dynamic redistribution is at fault → then decide whether to fix the model, rewrite the SQL, or tune parameters. Parameter tuning alone rarely cures a bad distribution key.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Intermediate Result Bloat Can Hurt More Than Scanning
&lt;/h2&gt;

&lt;p&gt;A slow query's bottleneck is often not reading data, but the sheer size of intermediate results. &lt;code&gt;SELECT *&lt;/code&gt; quickly inflates result sets, making subsequent sorting, aggregation, and network exchange far heavier. The safer approach is to select only necessary columns, push filters as early as possible, and reduce data volume before large joins. When needed, consult &lt;code&gt;express.log&lt;/code&gt; to see which execution stages are truly expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Correlate Logs and Audit Trails for Faster Diagnosis
&lt;/h2&gt;

&lt;p&gt;Node‑level statistics tell you &lt;em&gt;which&lt;/em&gt; SQL is slow and &lt;em&gt;where&lt;/em&gt;, but logs explain &lt;em&gt;why&lt;/em&gt; it's slow and what anomalies accompanied it. Focus on &lt;code&gt;express.log&lt;/code&gt; (engine exceptions), &lt;code&gt;system.log&lt;/code&gt; (crash stack traces), and &lt;code&gt;gcware.log&lt;/code&gt; (node state and replica operations). Audit logs reveal the history of bulk operations, DDL changes, and parameter modifications — correlate them with performance dips to quickly narrow the cause.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Primary‑Replica Inconsistency: Stop Treating It as a Tuning Problem
&lt;/h2&gt;

&lt;p&gt;When results become unstable, a node behaves abnormally after recovery, or DML succeeds but subsequent reads are inconsistent, shift the investigation to primary‑replica consistency. Common causes include inconsistent local parameters, sudden power loss, RAID controller/driver issues, VM abnormal exit, or manual mistakes. GBase 8a provides the &lt;code&gt;gcluster_suffix_consistency_resolve&lt;/code&gt; parameter (default 0; set to 1 to attempt automatic resolution). It can detect and repair row‑count mismatches, schema inconsistencies, and SCN discrepancies, provided the cluster has at least three host nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Recommended Troubleshooting Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Is the slowdown global or per‑node?&lt;/strong&gt; Use node‑level statistics to locate tail nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is data skew present?&lt;/strong&gt; Check distribution key design and dynamic redistribution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Examine intermediate results and resource consumption.&lt;/strong&gt; Slim down columns and watch execution logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correlate audit and system logs.&lt;/strong&gt; Look for recent operational anomalies or resource conflicts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rule out primary‑replica consistency issues.&lt;/strong&gt; Prioritise this when instability follows node recovery or environment events.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Layering the investigation before deciding whether to rewrite SQL, tune parameters, scan logs, or repair consistency is far more efficient than jumping to conclusions in a &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>operations</category>
    </item>
    <item>
      <title>Making GBase 8a Backup and Recovery Reliable: Full Backups, Incrementals, and Recovery Drills</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 23 Jun 2026 13:34:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/making-gbase-8a-backup-and-recovery-reliable-full-backups-incrementals-and-recovery-drills-1flb</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/making-gbase-8a-backup-and-recovery-reliable-full-backups-incrementals-and-recovery-drills-1flb</guid>
      <description>&lt;p&gt;The &lt;code&gt;gcrcman&lt;/code&gt; tool in GBase 8a supports backup and recovery at the cluster, database, table, and batch-table levels. In a production &lt;strong&gt;gbase database&lt;/strong&gt;, knowing the commands isn't enough — you need a sound backup strategy, verified recovery prerequisites, and the right granularity for the incident you're facing. This guide covers full and incremental backups, table‑level recovery, batch backup limits, and regular recovery drills.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Layered Backup Strategy
&lt;/h2&gt;

&lt;p&gt;Don't back up everything the same way. Design your strategy around business importance and recovery objectives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core databases&lt;/strong&gt;: Periodic full backups (&lt;code&gt;level 0&lt;/code&gt;) as the ultimate safety net.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequently changing objects&lt;/strong&gt;: Incremental backups (&lt;code&gt;level 1&lt;/code&gt;) to refine recovery points without the cost of daily fulls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High‑risk tables&lt;/strong&gt;: Dedicated table‑level or batch backups to recover from accidental drops or bad &lt;code&gt;DELETE&lt;/code&gt;/&lt;code&gt;UPDATE&lt;/code&gt; — by far the most common incidents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Prerequisites Matter More Than the Commands
&lt;/h2&gt;

&lt;p&gt;Before running &lt;code&gt;gcrcman.py&lt;/code&gt;, ensure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You operate as the &lt;code&gt;dbauser&lt;/code&gt; specified during installation.&lt;/li&gt;
&lt;li&gt;Execution happens on a &lt;strong&gt;coordinator&lt;/strong&gt; node.&lt;/li&gt;
&lt;li&gt;All nodes are network‑reachable, and the backup directory exists on every node with read/write permissions for &lt;code&gt;dbauser&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The backup path is &lt;strong&gt;never&lt;/strong&gt; under &lt;code&gt;$GCLUSTER_BASE&lt;/code&gt;, &lt;code&gt;$GBASE_BASE&lt;/code&gt;, or &lt;code&gt;$GCWARE_BASE&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The cluster topology (management nodes, data nodes, distribution info) at recovery time matches the topology at backup time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Essential Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# View backup history&lt;/span&gt;
python &lt;span class="nv"&gt;$GCLUSTER_BASE&lt;/span&gt;/server/bin/gcrcman.py &lt;span class="nt"&gt;-d&lt;/span&gt; /backup/cluster_bak &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"show backup"&lt;/span&gt;

&lt;span class="c"&gt;# Cluster‑level full backup&lt;/span&gt;
python &lt;span class="nv"&gt;$GCLUSTER_BASE&lt;/span&gt;/server/bin/gcrcman.py &lt;span class="nt"&gt;-d&lt;/span&gt; /backup/cluster_bak &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"backup level 0"&lt;/span&gt;

&lt;span class="c"&gt;# Database‑level full backup&lt;/span&gt;
python &lt;span class="nv"&gt;$GCLUSTER_BASE&lt;/span&gt;/server/bin/gcrcman.py &lt;span class="nt"&gt;-d&lt;/span&gt; /backup/cluster_bak &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"backup database appdb level 0"&lt;/span&gt;

&lt;span class="c"&gt;# Table‑level full backup&lt;/span&gt;
python &lt;span class="nv"&gt;$GCLUSTER_BASE&lt;/span&gt;/server/bin/gcrcman.py &lt;span class="nt"&gt;-d&lt;/span&gt; /backup/cluster_bak &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"backup table appdb.fact_order level 0"&lt;/span&gt;

&lt;span class="c"&gt;# Table‑level force recovery (overwrites existing table)&lt;/span&gt;
python &lt;span class="nv"&gt;$GCLUSTER_BASE&lt;/span&gt;/server/bin/gcrcman.py &lt;span class="nt"&gt;-d&lt;/span&gt; /backup/cluster_bak &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"recover force table appdb.fact_order"&lt;/span&gt;

&lt;span class="c"&gt;# Refresh table metadata after recovery&lt;/span&gt;
refresh table appdb.fact_order&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key parameters: &lt;code&gt;-r&lt;/code&gt; sets parallelism (default 4), &lt;code&gt;-t&lt;/code&gt; sets the transaction wait timeout (default 300 s), &lt;code&gt;-C&lt;/code&gt; enables backup data verification.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Table‑Level Recovery Is the Most Practical
&lt;/h2&gt;

&lt;p&gt;Dropping a table by mistake is the most frequent production accident. After performing a table‑level recovery, always run &lt;code&gt;refresh table&lt;/code&gt; to re‑register the table object in the cluster. Verify the recovered row count matches expectations.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Batch Backup Limitations
&lt;/h2&gt;

&lt;p&gt;To back up a set of tables, list them in a file (format &lt;code&gt;database.table&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;appdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fact_order&lt;/span&gt;
&lt;span class="n"&gt;appdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fact_trade_detail&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nv"&gt;$GCLUSTER_BASE&lt;/span&gt;/server/bin/gcrcman.py &lt;span class="nt"&gt;-d&lt;/span&gt; /backup/cluster_bak &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"backup tables table.list level 0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Critical rule&lt;/strong&gt;: Within the same backup cycle, the &lt;code&gt;table.list&lt;/code&gt; file must be &lt;strong&gt;identical&lt;/strong&gt; for the full backup and all subsequent incremental backups. You cannot add new tables to an incremental run — doing so corrupts the cycle logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Recovery Drills Validate the Whole Effort
&lt;/h2&gt;

&lt;p&gt;Untested backups are worthless. Establish a regular drill cadence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weekly&lt;/strong&gt;: Restore a few critical tables and verify row counts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monthly&lt;/strong&gt;: Perform a full database recovery exercise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After any topology change&lt;/strong&gt;: Re‑validate the entire recovery chain.&lt;/li&gt;
&lt;li&gt;During every drill, check: row counts, business query correctness, view and index integrity, and whether recovery time fits the acceptable window.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A reliable backup and recovery practice in a &lt;strong&gt;gbase database&lt;/strong&gt; goes beyond running &lt;code&gt;gcrcman.py&lt;/code&gt;. It's a closed loop of strategy, permissions, topology awareness, and continuous verification.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>operations</category>
    </item>
    <item>
      <title>GBase 8a Operations Inspection and Alerting: Don't Wait for a Failure to Check the Logs</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 23 Jun 2026 12:29:29 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-operations-inspection-and-alerting-dont-wait-for-a-failure-to-check-the-logs-1mka</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-operations-inspection-and-alerting-dont-wait-for-a-failure-to-check-the-logs-1mka</guid>
      <description>&lt;p&gt;Keeping a &lt;strong&gt;gbase database&lt;/strong&gt; cluster running smoothly in production isn't just about fixing problems — it's about having a solid routine for inspection, monitoring, slow‑query analysis, audit log usage, and tiered alerting. This article covers these five areas with practical, actionable steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Inspections Go Beyond Cluster Status — Cover Three Layers
&lt;/h2&gt;

&lt;p&gt;Effective daily inspections span three layers: the cluster layer (node status, service processes), the database layer (slow SQL, connection counts, session states), and the system layer (CPU, memory, disk, I/O). Relying solely on &lt;code&gt;gcadmin&lt;/code&gt; to check that the cluster is ACTIVE won't tell you why queries suddenly slowed or why one node consistently lags.&lt;/p&gt;

&lt;p&gt;Essential daily inspection commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcadmin
ps &lt;span class="nt"&gt;-ef&lt;/span&gt; | egrep &lt;span class="s1"&gt;'gcware|gcluster|gnode'&lt;/span&gt;
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-100&lt;/span&gt; /opt/gbase/gcluster/log/system.log
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-100&lt;/span&gt; /opt/gbase/gcware/log/gcware.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Prioritise Core Monitoring Metrics — Avoid Dashboard Clutter
&lt;/h2&gt;

&lt;p&gt;Monitor the following five categories first, before expanding to a full dashboard:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Typical Metrics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cluster availability&lt;/td&gt;
&lt;td&gt;Node online, cluster ACTIVE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource pressure&lt;/td&gt;
&lt;td&gt;CPU, memory, disk usage, I/O wait&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL behaviour&lt;/td&gt;
&lt;td&gt;Slow query count, execution duration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection status&lt;/td&gt;
&lt;td&gt;Connection count, active sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational trails&lt;/td&gt;
&lt;td&gt;Audit logs, backend errors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Start by collecting per‑node CPU/memory/IO, cluster state, critical process liveness, disk usage, slow‑query statistics, and core‑log error counts. These alone often reveal issues before users notice.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Slow‑Query Monitoring: Record Them, Then Pinpoint Which Node
&lt;/h2&gt;

&lt;p&gt;In a distributed &lt;strong&gt;gbase database&lt;/strong&gt;, slow queries are often caused by just a few overloaded nodes. Enable slow‑query recording first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;GLOBAL&lt;/span&gt; &lt;span class="n"&gt;gcluster_dql_statistic_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- record queries over 3 seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then retrieve the recorded queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sys_sqls&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Capture the data first, observe the patterns, and only then decide whether to adjust parallelism, thread pools, or other parameters — never tune blindly.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Include Logs and Audit Trails in Routine Checks
&lt;/h2&gt;

&lt;p&gt;Don't wait for a failure to read logs. Spot‑check for these signals daily: abnormal node states, repeated recovery messages, frequent internal errors, load anomalies, and audit export failures.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'error'&lt;/span&gt; /opt/gbase/gcluster/log/system.log | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-50&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'warn'&lt;/span&gt;  /opt/gbase/gcware/log/gcware.log | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Audit logs are more than a compliance checkbox — they let you trace who did what and when, and can reveal bulk operations that preceded a slowdown. GBase 8a consolidates audit records into the &lt;code&gt;audit_log_express&lt;/code&gt; table. Add audit export health, unexpected DDL/DML, and sudden audit volume spikes to your inspection list.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Tier Your Alerts to Prevent Fatigue
&lt;/h2&gt;

&lt;p&gt;Group alerts into three severity levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;P1 – Critical&lt;/strong&gt;: Node offline, cluster not ACTIVE, key process missing, disk full&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P2 – Important&lt;/strong&gt;: Slow‑query surge, abnormal connection count, audit anomaly, excessive I/O&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P3 – Warning&lt;/strong&gt;: Negative trends, fast disk growth, rising log alert frequency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For disk usage, trigger a P2 warning above &lt;strong&gt;85%&lt;/strong&gt; and a P1 critical alert above &lt;strong&gt;95%&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Recommended Operational Cadence
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily&lt;/strong&gt;: &lt;code&gt;gcadmin&lt;/code&gt;, check key processes, review system logs, inspect disk space, look for abnormal slow‑query growth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly&lt;/strong&gt;: Slow‑query trends, connection count changes, audit log spot‑check, node load balance, backup and data‑load task status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monthly&lt;/strong&gt;: Parameter baseline review, hardware health check, log alert trend analysis, alert threshold adjustments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A stable &lt;strong&gt;gbase database&lt;/strong&gt; isn't just about what you do when things break — it's about seeing the signals that were there all along. Build the routine, tier the alerts, and you'll catch most problems before they become incidents.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>GBase 8a Table Design and Modeling: Choosing Data Types, Partitions, Distribution Keys, and Replicated Tables</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sun, 21 Jun 2026 15:50:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-table-design-and-modeling-choosing-data-types-partitions-distribution-keys-and-dm2</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-table-design-and-modeling-choosing-data-types-partitions-distribution-keys-and-dm2</guid>
      <description>&lt;p&gt;In a distributed analytical &lt;strong&gt;gbase database&lt;/strong&gt;, many performance issues are baked in at the table design stage. Data types, partitioning, distribution keys, and replicated table strategies largely determine query cost down the line. This guide walks through these four core design decisions with practical, implementable advice.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Modeling Matters More Than Post‑Hoc Tuning
&lt;/h2&gt;

&lt;p&gt;The GBase 8a community consensus on query optimisation is clear: prioritise business SQL and table structure first, then tune database parameters, and only then add hardware. The way data is organised sets the upper bound for query performance.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Design Area&lt;/th&gt;
&lt;th&gt;Common Shortcut&lt;/th&gt;
&lt;th&gt;Later Pain&lt;/th&gt;
&lt;th&gt;Better Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data types&lt;/td&gt;
&lt;td&gt;Store everything as strings&lt;/td&gt;
&lt;td&gt;Heavy scans, poor compression, constant casting&lt;/td&gt;
&lt;td&gt;Choose types by actual semantics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partitioning&lt;/td&gt;
&lt;td&gt;Skip it initially, add later&lt;/td&gt;
&lt;td&gt;Hard to manage, clean, and query large tables&lt;/td&gt;
&lt;td&gt;Partition time‑based large tables early&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distribution key&lt;/td&gt;
&lt;td&gt;Pick any familiar column&lt;/td&gt;
&lt;td&gt;Node skew, slow GROUP/JOIN&lt;/td&gt;
&lt;td&gt;Prefer high‑cardinality columns used in frequent JOINs/GROUPs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicated tables&lt;/td&gt;
&lt;td&gt;Build everything as a distribution table&lt;/td&gt;
&lt;td&gt;Extra redistribution on small‑table JOINs&lt;/td&gt;
&lt;td&gt;Consider replication for small, frequently‑joined dimension tables&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Data Types: They Dictate Compression, Scanning, and Computation
&lt;/h2&gt;

&lt;p&gt;The clearer the business semantics, the less you should compromise on types.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Status and type codes&lt;/strong&gt;: Use &lt;code&gt;TINYINT&lt;/code&gt;/&lt;code&gt;SMALLINT&lt;/code&gt;/&lt;code&gt;INT&lt;/code&gt;, not &lt;code&gt;VARCHAR&lt;/code&gt; for enumerated values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monetary amounts&lt;/strong&gt;: Use &lt;code&gt;DECIMAL&lt;/code&gt;; avoid &lt;code&gt;FLOAT&lt;/code&gt;/&lt;code&gt;DOUBLE&lt;/code&gt; precision issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time‑based filter columns&lt;/strong&gt;: Use &lt;code&gt;DATE&lt;/code&gt;/&lt;code&gt;DATETIME&lt;/code&gt;/&lt;code&gt;TIMESTAMP&lt;/code&gt;; never store dates as &lt;code&gt;VARCHAR&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distributed sequence numbers&lt;/strong&gt;: Use &lt;code&gt;BIGINT&lt;/code&gt;; &lt;code&gt;INT&lt;/code&gt; risks overflow on large tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anti‑pattern vs. correct approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Anti‑pattern: string‑everything&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ods_order_raw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;      &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;order_status&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;pay_amt&lt;/span&gt;      &lt;span class="nb"&gt;DOUBLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_time&lt;/span&gt;  &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Correct: semantic types&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ods_order_raw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;     &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;      &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_status&lt;/span&gt; &lt;span class="nb"&gt;TINYINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pay_amt&lt;/span&gt;      &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;create_time&lt;/span&gt;  &lt;span class="nb"&gt;DATETIME&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Partitioning: Plan for Large Tables from the Start
&lt;/h2&gt;

&lt;p&gt;GBase 8a supports RANGE, LIST, HASH, and KEY partitioning. Total partitions cannot exceed 8,192; production best practice is to keep per‑table partitions under 50. The partition key column cannot be updated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tables that benefit from partitioning&lt;/strong&gt;: daily/monthly fact tables, historical log tables — data with natural time boundaries that need periodic cleanup and range queries. Skip partitioning for small dimension tables and high‑update small tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dwd_trade_detail&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;trade_id&lt;/span&gt;   &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;    &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;shop_id&lt;/span&gt;    &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pay_amt&lt;/span&gt;    &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;trade_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trade_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202601&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-02-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202602&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202603&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-04-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Partition pruning is the real payoff&lt;/strong&gt;: partitioning helps only when queries land on a subset of partitions. Avoid wrapping the partition key in functions (&lt;code&gt;DATE_FORMAT&lt;/code&gt;); use direct range filters to let partition pruning work.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Hash Distribution Key: The Foundation of Horizontal Data Placement
&lt;/h2&gt;

&lt;p&gt;The distribution key determines how evenly data is spread across nodes and directly impacts whether GROUP BY and JOIN can execute locally. Evaluate in this order: data uniformity → frequent JOIN column → frequent GROUP BY column → still uniform after filtering.&lt;/p&gt;

&lt;p&gt;Common mistake: using low‑cardinality columns like &lt;code&gt;province_code&lt;/code&gt; as the distribution key, causing severe node skew and forcing extra redistribution during aggregation and JOINs.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Replicated Tables: Best for Small Dimension Tables
&lt;/h2&gt;

&lt;p&gt;A replicated table stores a full copy on every gnode, enabling fully local JOINs with fact tables — zero network transfer. Ideal for small, frequently‑read dimension and dictionary tables. Avoid for large fact tables and high‑churn large tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dim_region&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;region_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;region_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;REPLICATED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Recommended Modeling Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define types by business semantics&lt;/strong&gt; — lock down the real meaning of status codes, amounts, times, and primary keys first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decide on partitioning&lt;/strong&gt; — time‑accumulating large tables and log tables are the prime candidates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose the distribution strategy&lt;/strong&gt; — for distribution tables, prioritise uniformity, then JOIN/GROUP needs; evaluate replication for small dimension tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review expected query patterns&lt;/strong&gt; — verify that future queries will filter by the partition key and frequently JOIN/GROUP by the chosen distribution key.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In a &lt;strong&gt;gbase database&lt;/strong&gt;, slow queries are often not "discovered" — they are "built in" at the design stage. Getting data types, partitioning, distribution keys, and replication right from the start dramatically reduces the tuning burden later.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
    <item>
      <title>Deep Dive into GBase 8a MPP Distributed Query Execution</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sun, 21 Jun 2026 14:43:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/deep-dive-into-gbase-8a-mpp-distributed-query-execution-k12</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/deep-dive-into-gbase-8a-mpp-distributed-query-execution-k12</guid>
      <description>&lt;p&gt;How does a SQL statement travel through a GBase 8a cluster — from parsing and plan generation to parallel execution and final aggregation? This article explains the complete execution path, the roles of coordinator and data nodes, and common performance pitfalls in a &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Architecture Recap: Three Roles
&lt;/h2&gt;

&lt;p&gt;GBase 8a MPP Cluster consists of three core process types:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Process&lt;/th&gt;
&lt;th&gt;Node Type&lt;/th&gt;
&lt;th&gt;Primary Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gcluster&lt;/td&gt;
&lt;td&gt;Coordinator&lt;/td&gt;
&lt;td&gt;SQL parsing, plan generation, task distribution, result assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Data Node&lt;/td&gt;
&lt;td&gt;Data storage, local scan, partial aggregation, Hash Join&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gcware&lt;/td&gt;
&lt;td&gt;Cluster Manager&lt;/td&gt;
&lt;td&gt;Heartbeat, replica consistency arbitration, failover&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Clients communicate only with gcluster. gcluster holds metadata (table definitions, distribution info, replica topology) but stores no user data.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Full Lifecycle of a Query
&lt;/h2&gt;

&lt;p&gt;Consider this typical analytical query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;dept_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dept_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stage 1: Parsing and Semantic Checks (gcluster)
&lt;/h3&gt;

&lt;p&gt;The SQL Parser in gcluster converts the text into an AST and performs semantic validation — verifying that tables and columns exist and that data types are compatible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Query Plan Generation (gcluster)
&lt;/h3&gt;

&lt;p&gt;The optimizer generates a Distributed Query Plan (DQP) based on metadata. Two core decisions are made:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pushdown vs. aggregation&lt;/strong&gt;: Filter conditions like &lt;code&gt;WHERE order_date &amp;gt;= '2024-01-01'&lt;/code&gt; are pushed down to each gnode to avoid transferring full datasets. Because &lt;code&gt;dept_id&lt;/code&gt; is unlikely to be the distribution key, aggregation requires each gnode to first perform partial aggregation, then redistribute the partial results by &lt;code&gt;dept_id&lt;/code&gt; hash before doing final aggregation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data redistribution strategy&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hash Redistribute&lt;/strong&gt;: Triggered when the JOIN/GROUP BY column is not the distribution key. Cost: network transfer + shuffle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broadcast&lt;/strong&gt;: Small tables can be broadcast to all nodes instead of being redistributed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No redistribution&lt;/strong&gt;: Optimal — when the JOIN/GROUP BY column happens to be the distribution key.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key parameters: &lt;code&gt;gcluster_hash_redistribute_join_optimize&lt;/code&gt; and &lt;code&gt;gcluster_hash_redistribute_groupby_optimize&lt;/code&gt; control whether small tables are broadcast to avoid unnecessary hash shuffles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: Task Distribution and Parallel Execution (gcluster → gnode)
&lt;/h3&gt;

&lt;p&gt;gcluster splits the DQP into multiple fragments and sends them concurrently to all participating gnodes over internal TCP channels. Each gnode then uses worker threads (controlled by &lt;code&gt;gbase_parallel_degree&lt;/code&gt;) to scan its local data segments in parallel.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcluster
  ├─ Fragment-1 → gnode1 (local scan + partial aggregation)
  ├─ Fragment-1 → gnode2 (local scan + partial aggregation)
  └─ Fragment-1 → gnode3 (local scan + partial aggregation)
         ↓
  [Hash Redistribute by dept_id]
         ↓
  ├─ Fragment-2 → gnode1 (final aggregation)
  ├─ Fragment-2 → gnode2
  └─ Fragment-2 → gnode3
         ↓
  gcluster merges TOP 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stage 4: Final Merge and Return to Client (gcluster)
&lt;/h3&gt;

&lt;p&gt;Each gnode streams its fragment result back to gcluster. For &lt;code&gt;ORDER BY ... LIMIT 100&lt;/code&gt;, gcluster performs a final merge‑sort to pick the top‑N rows and returns them to the client.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Intermediate Tables and Debugging
&lt;/h2&gt;

&lt;p&gt;For complex queries, gnodes create internal temporary tables that are automatically dropped after execution. To keep them for troubleshooting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;gcluster_executor_debug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚠️ Debug only — never leave this on in production, or intermediate tables will fill the disk.&lt;/p&gt;

&lt;p&gt;To see currently executing queries and per‑node timings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="n"&gt;PROCESSLIST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Requires prior configuration (gcluster_dql_statistic_threshold in milliseconds)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dql_statistic&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;exec_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Common Query Performance Pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Cartesian Product Causing Disk Spikes
&lt;/h3&gt;

&lt;p&gt;When a JOIN condition is missing, two large tables produce a Cartesian product that can reach terabytes. Cap intermediate row counts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# gnode gbase.cnf
&lt;/span&gt;&lt;span class="py"&gt;_gbase_result_threshold&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1000000000  -- error if &amp;gt;1 billion rows&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfall 2: Data Skew Turning One Node into a Bottleneck
&lt;/h3&gt;

&lt;p&gt;GROUP BY on a low‑cardinality column concentrates all data on a few nodes after hash redistribution. Solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose a high‑cardinality distribution key&lt;/li&gt;
&lt;li&gt;Enable multi‑column hash redistribution for skewed GROUP BYs:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;_t_gcluster_distinct_multi_redist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;_t_gcluster_hash_redistribute_groupby_on_multiple_expression&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfall 3: Small Tables Treated as Distribution Tables During JOINs
&lt;/h3&gt;

&lt;p&gt;The optimizer may hash‑redistribute many small tables, generating excessive network traffic. Build frequently used small tables as replicated tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dim_region&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;region_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;region_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;REPLICATED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Process&lt;/th&gt;
&lt;th&gt;Key Actions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Parse &amp;amp; Optimize&lt;/td&gt;
&lt;td&gt;gcluster&lt;/td&gt;
&lt;td&gt;AST creation, DQP planning, redistribution strategy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Execution&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Data scan, partial aggregation, Hash Join&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Shuffle&lt;/td&gt;
&lt;td&gt;gnode ↔ gnode&lt;/td&gt;
&lt;td&gt;Hash Redistribute / Broadcast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Final Merge&lt;/td&gt;
&lt;td&gt;gcluster&lt;/td&gt;
&lt;td&gt;Merge‑sort, Top‑N, return to client&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Understanding this pipeline is the key to pinpointing bottlenecks in a &lt;strong&gt;gbase database&lt;/strong&gt;: is the redistribution too expensive? Is one gnode scanning too slowly? Or has gcluster become the single‑point merge bottleneck? Use &lt;code&gt;EXPLAIN&lt;/code&gt; and &lt;code&gt;dql_statistic&lt;/code&gt; system tables for precise diagnosis.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
    <item>
      <title>GBase 8a Table Design in Practice: Choosing Distribution Keys, Partitions, and Replicated Tables</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sun, 21 Jun 2026 14:10:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-table-design-in-practice-choosing-distribution-keys-partitions-and-replicated-tables-403e</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-table-design-in-practice-choosing-distribution-keys-partitions-and-replicated-tables-403e</guid>
      <description>&lt;p&gt;Many performance issues are baked in the moment a table is created. This guide systematically explains table design decisions in GBase 8a: how to pick distribution keys, when to partition, how to use replicated tables, and how to choose the right data types — with anti‑patterns and a complete example.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. How Data Is Distributed Across Nodes
&lt;/h2&gt;

&lt;p&gt;GBase 8a uses a Shared‑Nothing architecture. Data is horizontally partitioned and spread across gnodes based on the distribution key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;    &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dept_id&lt;/span&gt;     &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;      &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;order_date&lt;/span&gt;  &lt;span class="nb"&gt;DATE&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;DISTRIBUTED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;HASH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A hash function maps every row with the same &lt;code&gt;customer_id&lt;/code&gt; to the same gnode. If &lt;code&gt;DISTRIBUTED BY&lt;/code&gt; is omitted, the first column is used by default — rarely what you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Core Principles for Choosing a Distribution Key
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High cardinality&lt;/strong&gt;: The more unique values, the more evenly data is spread. &lt;code&gt;user_id&lt;/code&gt; or &lt;code&gt;order_id&lt;/code&gt; are ideal; &lt;code&gt;gender&lt;/code&gt; or &lt;code&gt;province&lt;/code&gt; cause severe skew.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The column used in high‑frequency JOINs&lt;/strong&gt;: If two tables are often joined on the same key, set that key as the distribution key on both sides. The JOIN then runs locally without cross‑node data shuffle, giving the best performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid date or time columns&lt;/strong&gt;: They have limited unique values and are almost never used in JOIN conditions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Partitioning: How It Differs from Distribution
&lt;/h2&gt;

&lt;p&gt;The distribution key decides &lt;em&gt;which node&lt;/em&gt; data goes to; partitioning decides how data is organised &lt;em&gt;inside&lt;/em&gt; each node. GBase 8a supports Range partitioning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;   &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;     &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;DISTRIBUTED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;HASH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2023&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2025&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;pmax&lt;/span&gt;  &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="k"&gt;MAXVALUE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Partition pruning&lt;/strong&gt;: when the query includes a filter on the partition key, only the relevant partitions are scanned. Use partitioning when a single node holds tens of GBs or more, queries frequently filter by time range, or you need fast historical data cleanup (&lt;code&gt;ALTER TABLE DROP PARTITION&lt;/code&gt; is orders of magnitude faster than &lt;code&gt;DELETE&lt;/code&gt;). Avoid partitioning for tables under 100 million rows, full‑scan workloads, or when the partition count exceeds 1,000 (metadata overhead becomes significant).&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Replicated Tables: The Best Strategy for Small Dimension Tables
&lt;/h2&gt;

&lt;p&gt;For lookup tables, dictionary tables, and other small, rarely‑updated tables, use replication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dim_product&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;REPLICATED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A replicated table stores a full copy on every gnode. JOINs between a fact table and a replicated table require zero network transfer — they run entirely locally. Replication is ideal when row count is under 1 million and updates are rare. Between 1–10 million rows with occasional updates, proceed with caution. Beyond 10 million rows or with frequent writes, use a distribution table with a proper key.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Data Type Selection
&lt;/h2&gt;

&lt;p&gt;GBase 8a is a columnar store engine. Data types directly affect compression ratio and query performance.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strings&lt;/strong&gt;: Store enumerated values as &lt;code&gt;TINYINT&lt;/code&gt;/&lt;code&gt;SMALLINT&lt;/code&gt;; use &lt;code&gt;VARCHAR&lt;/code&gt; only for truly variable‑length descriptions. Low‑cardinality strings compress extremely well.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Numbers&lt;/strong&gt;: Use &lt;code&gt;INT&lt;/code&gt;/&lt;code&gt;BIGINT&lt;/code&gt; for integers — never &lt;code&gt;DECIMAL(20,0)&lt;/code&gt;. Use &lt;code&gt;DECIMAL(18,2)&lt;/code&gt; for monetary amounts; never &lt;code&gt;DOUBLE&lt;/code&gt; (floating‑point precision issues).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal&lt;/strong&gt;: Use &lt;code&gt;DATETIME&lt;/code&gt; for full timestamps, &lt;code&gt;DATE&lt;/code&gt; for date‑only columns. Never store dates as &lt;code&gt;VARCHAR&lt;/code&gt; — it prevents partition pruning and date‑function optimisations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Complete Table Design Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Fact table: large, distributed by high‑cardinality customer_id, partitioned by quarter&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;     &lt;span class="nb"&gt;BIGINT&lt;/span&gt;      &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;  &lt;span class="nb"&gt;INT&lt;/span&gt;         &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt;         &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dept_id&lt;/span&gt;      &lt;span class="nb"&gt;SMALLINT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;       &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;       &lt;span class="nb"&gt;TINYINT&lt;/span&gt;     &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_date&lt;/span&gt;   &lt;span class="nb"&gt;DATE&lt;/span&gt;        &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_time&lt;/span&gt;  &lt;span class="nb"&gt;DATETIME&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;DISTRIBUTED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;HASH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024q1&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-04-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024q2&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-07-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024q3&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-10-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024q4&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2025&lt;/span&gt;   &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;pmax&lt;/span&gt;    &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="k"&gt;MAXVALUE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Dimension table: small, replicated&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dim_product&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt;          &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;brand&lt;/span&gt;        &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;REPLICATED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Common Anti‑Patterns
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Anti‑Pattern&lt;/th&gt;
&lt;th&gt;Consequence&lt;/th&gt;
&lt;th&gt;Correct Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No distribution key specified&lt;/td&gt;
&lt;td&gt;Defaults to first column, often skewed&lt;/td&gt;
&lt;td&gt;Explicitly specify &lt;code&gt;DISTRIBUTED BY HASH(appropriate_column)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distribution on low‑cardinality columns&lt;/td&gt;
&lt;td&gt;Severe node imbalance&lt;/td&gt;
&lt;td&gt;Use high‑cardinality columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dimension table as a distribution table&lt;/td&gt;
&lt;td&gt;Hash redistribution on every JOIN&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;REPLICATED&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;VARCHAR(255)&lt;/code&gt; for enumerated values&lt;/td&gt;
&lt;td&gt;Poor compression, higher memory&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;TINYINT&lt;/code&gt;/&lt;code&gt;SMALLINT&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Excessive partitions (&amp;gt;1,000)&lt;/td&gt;
&lt;td&gt;High metadata overhead, slow planning&lt;/td&gt;
&lt;td&gt;Partition by quarter or year instead of day&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Good table design is the starting point of performance optimisation in a &lt;strong&gt;gbase database&lt;/strong&gt;. Changing a distribution key later requires rebuilding the table — a very expensive operation. During the design phase, answer three questions: what JOIN conditions are used most? Does the query workload have obvious time‑range filters? How large is the table and how frequently is it written? These answers directly determine your distribution key, partitioning strategy, and whether to use replication.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
    <item>
      <title>Permission Governance in GBase 8c: Separate Role Boundaries First, Then Assign Privileges</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sun, 21 Jun 2026 13:29:13 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/permission-governance-in-gbase-8c-separate-role-boundaries-first-then-assign-privileges-30c7</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/permission-governance-in-gbase-8c-separate-role-boundaries-first-then-assign-privileges-30c7</guid>
      <description>&lt;p&gt;Chaos in permission management almost always starts with granting privileges directly to users. The foundation of a maintainable &lt;strong&gt;gbase database&lt;/strong&gt; security model is strict separation of Users, Roles, and Privileges — users log in, roles carry permissions, and object privileges are granted only to roles.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Core Principle: Users Bind to Roles, Roles Carry Permissions
&lt;/h2&gt;

&lt;p&gt;A typical three‑tier role structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read‑only role&lt;/strong&gt;: for reports, audits, and read‑only access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read‑write role&lt;/strong&gt;: for routine application reads and writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Management role&lt;/strong&gt;: for object creation and maintenance, never bound directly to application programs.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create roles&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;app_read_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;app_rw_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;app_ddl_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Create users&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;app_reader&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'Example#2026'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;app_writer&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'Example#2026'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;app_owner&lt;/span&gt;  &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'Example#2026'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Bind users to roles&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;app_read_role&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_reader&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;app_rw_role&lt;/span&gt;   &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_writer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;app_ddl_role&lt;/span&gt;  &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_owner&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grant database, schema, and object privileges to the roles, never to individual users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;bizdb&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_read_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_rw_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_ddl_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;billing&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_read_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_rw_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_ddl_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;billing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;settle_result&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_read_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;billing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;settle_result&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_rw_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;billing&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_ddl_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When someone changes roles, you only adjust the user‑role binding — no per‑table re‑grant needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. When Troubleshooting, Check the Upper Permission Layers First
&lt;/h2&gt;

&lt;p&gt;Many "missing table permission" errors are actually missing &lt;code&gt;CONNECT&lt;/code&gt; or &lt;code&gt;USAGE&lt;/code&gt; higher up. Follow this order:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Most Likely Missing Privilege&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cannot connect to database&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONNECT ON DATABASE&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema visible but object access fails&lt;/td&gt;
&lt;td&gt;&lt;code&gt;USAGE ON SCHEMA&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query on a table fails&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT ON TABLE/VIEW&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write operations fail&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;INSERT&lt;/code&gt;/&lt;code&gt;UPDATE&lt;/code&gt;/&lt;code&gt;DELETE&lt;/code&gt;, sometimes &lt;code&gt;SELECT&lt;/code&gt; also required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calling a function fails&lt;/td&gt;
&lt;td&gt;&lt;code&gt;EXECUTE ON FUNCTION&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  3. Use Default Privileges to Set Boundaries for Future Objects
&lt;/h2&gt;

&lt;p&gt;Manual &lt;code&gt;GRANT&lt;/code&gt; only affects existing objects. New tables, sequences, and functions won't inherit those grants. &lt;code&gt;ALTER DEFAULT PRIVILEGES&lt;/code&gt; defines preset access rules for future objects, preventing midnight alerts caused by forgotten grants.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;billing&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_read_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;billing&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_rw_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;billing&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;SEQUENCES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_rw_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply default privileges early in any schema where objects are continuously created.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Separation of Duties for High‑Security Environments
&lt;/h2&gt;

&lt;p&gt;GBase 8c's separation of duties splits traditional superuser power into a System Administrator (&lt;code&gt;SYSADMIN&lt;/code&gt;) and a Security Administrator (&lt;code&gt;CREATEROLE&lt;/code&gt; + &lt;code&gt;POLADMIN&lt;/code&gt;). This prevents a single account from both maintaining the system and having unlimited access to data. It's strongly recommended in finance, government, and telecom environments. Note: when separation of duties is not enabled, the system administrator's effective privileges are broader.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Least Privilege by Business Action Chain
&lt;/h2&gt;

&lt;p&gt;Least privilege means "exactly what's needed to perform the task," not "as little as possible."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Report querying&lt;/strong&gt;: &lt;code&gt;CONNECT&lt;/code&gt; + &lt;code&gt;USAGE&lt;/code&gt; + &lt;code&gt;SELECT&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business writes&lt;/strong&gt;: &lt;code&gt;CONNECT&lt;/code&gt; + &lt;code&gt;USAGE&lt;/code&gt; + &lt;code&gt;SELECT&lt;/code&gt; + &lt;code&gt;INSERT&lt;/code&gt; + &lt;code&gt;UPDATE&lt;/code&gt; + &lt;code&gt;DELETE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calling functions&lt;/strong&gt;: add &lt;code&gt;EXECUTE&lt;/code&gt; to the above&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creating objects&lt;/strong&gt;: &lt;code&gt;CREATE ON SCHEMA/DATABASE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Table maintenance&lt;/strong&gt;: add &lt;code&gt;INDEX&lt;/code&gt;, &lt;code&gt;VACUUM&lt;/code&gt;, &lt;code&gt;ALTER&lt;/code&gt; as needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Connection Entry Is Also a Permission Boundary
&lt;/h2&gt;

&lt;p&gt;Security governance must cover not only object‑level privileges but also who can connect from which IP using which authentication method. Regularly review &lt;code&gt;listen_addresses&lt;/code&gt; and &lt;code&gt;pg_hba.conf&lt;/code&gt;. Manually editing &lt;code&gt;pg_hba.conf&lt;/code&gt; is a high‑risk operation and must follow documented procedures.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Recommended Governance Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Separate administrator responsibilities&lt;/strong&gt; — evaluate separation of duties; at minimum distinguish ops, security, and audit roles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design roles by job function&lt;/strong&gt;, not by individual.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grant database and schema privileges first&lt;/strong&gt;, then table/view/function privileges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set default privileges&lt;/strong&gt; so new objects automatically inherit the right rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Users only bind to roles&lt;/strong&gt; — never grant object privileges directly to users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unify connection‑level and object‑level governance&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A solid permission design in a &lt;strong&gt;gbase database&lt;/strong&gt; isn't about writing clever GRANT statements — it's about building a role hierarchy that stays clean as teams and objects grow. When the foundation is right, audits are painless, incident boundaries are clear, and new objects land with the correct permissions from day one.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>security</category>
    </item>
    <item>
      <title>Data Lifecycle Management in GBase 8c: Partitioning, Archiving, and Cleanup</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sat, 20 Jun 2026 15:39:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/data-lifecycle-management-in-gbase-8c-partitioning-archiving-and-cleanup-2e3i</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/data-lifecycle-management-in-gbase-8c-partitioning-archiving-and-cleanup-2e3i</guid>
      <description>&lt;p&gt;When a table grows unchecked for a couple of years, historical, log, and hot data mix together, making queries, deletions, and backups increasingly heavy. GBase 8c supports range, interval, list, and hash partitioning, providing an ideal foundation for data lifecycle management. The core is three things: smooth ingestion of new data, low‑risk archiving of old data, and stable cleanup of expired data.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Lifecycle Management Means Long‑Term Control
&lt;/h2&gt;

&lt;p&gt;Typical symptoms: a query for the last 7 days scans 3 years of data; deleting history causes heavy transactions and lock contention; archiving relies on slow &lt;code&gt;INSERT INTO archive SELECT ...&lt;/code&gt;; statistics drift and execution plans wobble. Lifecycle management turns the migration from hot → warm → cold → deletable data into a predictable, routine operation. Partitioned tables are the natural fit: queries only touch relevant partitions, and maintenance actions are scoped to a single partition rather than the entire table.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Time‑Based Partitioning Is the Most Practical Choice
&lt;/h2&gt;

&lt;p&gt;Although GBase 8c offers four partition types, the most natural boundary for lifecycle management is time. Range partitioning works well for data with clear start‑end intervals (monthly tables, billing period tables), while interval partitioning automatically extends partitions as time‑series data grows, saving manual effort.&lt;/p&gt;

&lt;p&gt;Choose partition keys that are frequently used in query predicates, have reasonably even distribution, and are not frequently updated. Date‑type columns such as &lt;code&gt;trade_date&lt;/code&gt;, &lt;code&gt;log_time&lt;/code&gt; are ideal lifecycle boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Start with Monthly Partitions
&lt;/h2&gt;

&lt;p&gt;Slicing by hour or day improves pruning but explodes the number of partition objects. For transaction details, logs, and event streams, monthly partitions typically strike a good balance between management overhead and pruning effectiveness.&lt;/p&gt;

&lt;p&gt;Example of monthly range partitioning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;acct_trade_detail&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;trade_id&lt;/span&gt;        &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;acct_no&lt;/span&gt;         &lt;span class="n"&gt;varchar2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;trade_time&lt;/span&gt;      &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trade_date&lt;/span&gt;      &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trade_amt&lt;/span&gt;       &lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;trade_status&lt;/span&gt;    &lt;span class="n"&gt;varchar2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;channel_code&lt;/span&gt;    &lt;span class="n"&gt;varchar2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trade_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202601&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-02-01 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202602&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-03-01 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202603&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-04-01 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;pmax&lt;/span&gt;   &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAXVALUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want automatic extension for continuous growth, use interval partitioning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;app_event_log&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt;       &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;        &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_time&lt;/span&gt;     &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_date&lt;/span&gt;     &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;     &lt;span class="n"&gt;varchar2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;        &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'1 month'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202601&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-02-01 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202602&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-03-01 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Maintenance Must Follow Up
&lt;/h2&gt;

&lt;p&gt;The second half of lifecycle management is even more critical: pre‑creating new partitions, archiving old partitions, dropping expired partitions, and then updating statistics and reclaiming space.&lt;/p&gt;

&lt;p&gt;Common maintenance commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Reclaim space and update visibility for a specific partition&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="n"&gt;acct_trade_detail&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p202601&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;acct_trade_detail&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;acct_trade_detail&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the MVCC model, old versions after updates or deletes don't disappear immediately — &lt;code&gt;VACUUM&lt;/code&gt; gradually reclaims space and maintains the visibility map.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Prefer Partition Drop Over Conditional DELETE
&lt;/h2&gt;

&lt;p&gt;Once a table is partitioned by time, dropping a partition is vastly more efficient than a large‑scale &lt;code&gt;DELETE ... WHERE&lt;/code&gt;. It avoids massive transactions, reduces lock contention, and eliminates the need for an immediate, heavy &lt;code&gt;VACUUM&lt;/code&gt;. Always confirm retention rules, back up or archive the data, then drop the partition safely.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Archiving Is About Isolating Online Workloads
&lt;/h2&gt;

&lt;p&gt;Archiving isn't just copying data out — it separates the online workload from historical queries. Even if historical data is "rarely queried," keeping it in the live main table still impacts statistics, maintenance cost, backup size, and some global operations. Use a three‑tier data model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hot data&lt;/strong&gt;: live main table, high‑frequency reads and writes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Warm data&lt;/strong&gt;: online archive table or low‑traffic database, occasional queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold data&lt;/strong&gt;: historical archive or external storage, extremely rare access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Separating hot and historical tables clearly makes the online layer far easier to manage.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Combine with Automatic Vacuuming and Statistics Updates
&lt;/h2&gt;

&lt;p&gt;After archiving or dropping partitions, always run &lt;code&gt;ANALYZE&lt;/code&gt; to prevent the optimizer from relying on outdated distribution statistics. Properly configure &lt;code&gt;AUTOVACUUM&lt;/code&gt; to execute &lt;code&gt;VACUUM&lt;/code&gt; and &lt;code&gt;ANALYZE&lt;/code&gt; automatically, reclaiming space and refreshing statistics. Build lifecycle maintenance into a fixed operational cadence: pre‑create partitions at month start, archive at month end, drop expired partitions, and refresh statistics after every large change.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. A Practical Lifecycle Management Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define retention boundaries first&lt;/strong&gt; (e.g., 90 days online, 12 months archive, 24 months purge)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use a time column as the primary partition key&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Start with monthly partitions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Separate online, archive, and purge layers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use partition drop instead of conditional DELETE wherever possible&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Follow up every major change with VACUUM/ANALYZE&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Well‑designed lifecycle management lets you fully leverage GBase 8c's partitioning capabilities in your &lt;strong&gt;gbase database&lt;/strong&gt;: lighter queries, smaller backups, and lower maintenance overhead. The question isn't "how big is the table?" but rather "is there a clear hot/cold boundary? Are objects split by lifecycle? Does cleanup still rely on heavy‑weight conditional statements? Have statistics and space been refreshed after cleanup?" Once these questions are answered, many downstream operational headaches simply disappear.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>operations</category>
    </item>
    <item>
      <title>Making GBase 8c Auditing Work: Traceable, Retainable, and Queryable</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sat, 20 Jun 2026 14:33:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/making-gbase-8c-auditing-work-traceable-retainable-and-queryable-202m</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/making-gbase-8c-auditing-work-traceable-retainable-and-queryable-202m</guid>
      <description>&lt;p&gt;GBase 8c offers a comprehensive auditing framework, but simply flipping the switch is not enough for production. Effective auditing requires systematic design across audit scope, granularity, retention, and query access. This article focuses on making critical actions traceable — covering audit item configuration, log retention, using &lt;code&gt;pg_query_audit&lt;/code&gt; as the primary query entry point, and routine inspection.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Define Audit Goals Before Selecting Items
&lt;/h2&gt;

&lt;p&gt;GBase 8c supports a wide range of audit items — login/logout, privilege changes, DDL, DML, SELECT, COPY, function execution, SET parameters, etc. Most items can be enabled dynamically without a restart. However, enabling everything indiscriminately will flood the logs. Prioritise based on your goals:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;th&gt;Recommended Items&lt;/th&gt;
&lt;th&gt;Avoid Enabling Immediately&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Security compliance&lt;/td&gt;
&lt;td&gt;Login/logout, user lock/unlock, privilege grant/revoke, database start/stop&lt;/td&gt;
&lt;td&gt;Full SELECT, all function execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational traceability&lt;/td&gt;
&lt;td&gt;Object DDL, SET parameters, database process events, COPY&lt;/td&gt;
&lt;td&gt;Full audit for all users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business data trails&lt;/td&gt;
&lt;td&gt;DML on specific tables, supplement with SELECT when necessary&lt;/td&gt;
&lt;td&gt;Blanket DML + SELECT across all tables&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A layered approach works best in practice: a baseline of system‑level audits (login, privilege, DDL, key parameter changes) that are always on, supplemented by targeted auditing on sensitive tables, key accounts, or during critical time windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Dynamic Parameter Changes for On‑Demand Auditing
&lt;/h2&gt;

&lt;p&gt;The master switch &lt;code&gt;audit_enabled&lt;/code&gt; and most subordinate switches can be reloaded at runtime, making temporary audit escalation straightforward. For example, to temporarily track DML on a specific table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gs_guc reload &lt;span class="nt"&gt;-N&lt;/span&gt; all &lt;span class="nt"&gt;-I&lt;/span&gt; all &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"audit_dml_state = 1"&lt;/span&gt;
gs_guc reload &lt;span class="nt"&gt;-N&lt;/span&gt; all &lt;span class="nt"&gt;-I&lt;/span&gt; all &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"audit_dml_state_select = 1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the current settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_directory&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_enabled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_dml_state&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_dml_state_select&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Use pg_query_audit as Your Primary Query Tool
&lt;/h2&gt;

&lt;p&gt;The built‑in function &lt;code&gt;pg_query_audit(start_time, end_time)&lt;/code&gt; lets you query audit records directly by time window, avoiding manual log scraping. Filter by action type and object name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;detail_info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_query_audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-03-25 09:00:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-25 10:00:00'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'dml_action'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dml_action_select'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;detail_info&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%acct_trade_detail%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To trace a specific user's actions, combine the time range with the username and object name.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Retention Policies Must Match Business Traceability Requirements
&lt;/h2&gt;

&lt;p&gt;GBase 8c provides these key parameters for managing audit log storage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_directory&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;-- storage directory&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_resource_policy&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;-- retention policy&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_space_limit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;-- total space cap&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_file_remain_time&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;-- minimum retention (default 90 days)&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;audit_file_remain_threshold&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="c1"&gt;-- max file count threshold&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common pitfalls: setting the space limit too low causes logs from a temporary audit escalation to be rolled off too quickly; retention time that doesn't align with monthly or quarterly review cycles leads to missing evidence. Design retention tiers based on scenario — keep baseline security audits long‑term, extend retention for sensitive databases, and promptly reduce granularity after temporary investigations.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. OS‑File Storage for Audit Independence
&lt;/h2&gt;

&lt;p&gt;GBase 8c writes audit results to operating system files rather than database tables by default. This separation prevents highly privileged users from tampering with audit records, reinforcing their credibility. In production, restrict access to the audit directory and consider using a dedicated security auditor role.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Recommended Rollout Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Enable baseline security items first&lt;/strong&gt;: login/logout, privilege changes, object DDL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify directory and retention settings&lt;/strong&gt;: check the parameters above to ensure logs aren't lost prematurely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add DML/SELECT auditing for critical objects&lt;/strong&gt;: target sensitive tables, key accounts, and specific time windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build a set of standard query templates&lt;/strong&gt;: at minimum, templates for querying by time, object name, and action type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate auditing into routine inspections&lt;/strong&gt;: monitor audit directory growth and look for abnormal spikes in SELECT/DML volume.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal of auditing isn't to record everything, but to make every critical action traceable. Following this methodology turns GBase 8c's auditing capabilities into a reliable evidence chain for your &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>security</category>
    </item>
    <item>
      <title>GBase 8c Performance Tuning: A Systematic Approach from Statistics and Execution Plans to Resource Pools</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sat, 20 Jun 2026 13:27:00 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/gbase-8c-performance-tuning-a-systematic-approach-from-statistics-and-execution-plans-to-resource-g0i</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/gbase-8c-performance-tuning-a-systematic-approach-from-statistics-and-execution-plans-to-resource-g0i</guid>
      <description>&lt;p&gt;GBase 8c, the China‑domestically developed multi‑model database from GBASE, supports row‑store, column‑store, and distributed deployment. When a query slows down, the cause often lies deeper than SQL syntax — outdated statistics, a shifted execution plan, or resource contention. This article walks through a layered tuning methodology: verify statistics, inspect the execution plan, align storage and distribution with workload, and finally manage sessions and resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. A Layered Perspective on Tuning
&lt;/h2&gt;

&lt;p&gt;Performance issues in a &lt;strong&gt;gbase database&lt;/strong&gt; generally fall into three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model layer&lt;/strong&gt;: Performance is unstable from the start, and scaling doesn't help. Check storage mode, distribution strategy, and index design.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimizer layer&lt;/strong&gt;: The same SQL suddenly shows a different plan with volatile execution times. Check statistics, EXPLAIN output, and misplaced hints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource layer&lt;/strong&gt;: Everything slows down during peak hours, even if no single query is terrible. Check work_mem, shared_buffers, resource pools, and Cgroups.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Statistics: The Foundation of the Execution Plan
&lt;/h2&gt;

&lt;p&gt;The optimizer relies on statistics collected by &lt;code&gt;ANALYZE&lt;/code&gt; and stored in &lt;code&gt;pg_class&lt;/code&gt;, &lt;code&gt;pg_statistic&lt;/code&gt;, etc. Stale statistics lead to inaccurate row estimates and poor plan choices.&lt;/p&gt;

&lt;p&gt;Always update statistics after bulk loads, deletes, archiving, partition switches, or when data distribution changes on hot columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Single table&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;sales_order&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Entire database&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Specific columns&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;sales_order&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Verify with EXPLAIN ANALYZE&lt;/span&gt;
&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pay_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales_order&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For partitioned tables, &lt;code&gt;ANALYZE&lt;/code&gt; updates both the parent and all child partitions — essential for accurate partition pruning.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Reading Execution Plans: Focus on Row Estimates and Operator Choice
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)&lt;/code&gt; to get detailed runtime information. Key indicators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row estimate vs. actual&lt;/strong&gt;: Large discrepancies lead to poor JOIN or scan choices.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scan type&lt;/strong&gt;: A Seq Scan on a large, frequently filtered column suggests missing indexes or stale statistics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Join type&lt;/strong&gt;: Hash Join spilling to disk usually means work_mem is too low or the input set is too large. Nested Loop driven by a large result set often points to wrong row estimates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sort and aggregation&lt;/strong&gt;: High cost on Sort/GroupAggregate may be reduced by slimming the column list or pre‑aggregating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buffer hit ratio&lt;/strong&gt;: A low shared hit ratio suggests the buffer cache may be undersized.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;VERBOSE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSTS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BUFFERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TIMING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pay_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales_order&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;dim_customer&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'VIP'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common plan signals and actions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Likely Cause&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Seq Scan on large table&lt;/td&gt;
&lt;td&gt;Missing index or bad row estimate&lt;/td&gt;
&lt;td&gt;Verify statistics first, then index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hash Join with heavy spill&lt;/td&gt;
&lt;td&gt;work_mem too small or large input&lt;/td&gt;
&lt;td&gt;Reduce input, increase session memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nested Loop with large driver&lt;/td&gt;
&lt;td&gt;Severely inaccurate row estimate&lt;/td&gt;
&lt;td&gt;Fix statistics, then consider hint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heavy Sort / GroupAggregate&lt;/td&gt;
&lt;td&gt;Bloated column set&lt;/td&gt;
&lt;td&gt;Slim SQL, pre‑aggregate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  4. Hints: Emergency Intervention Only
&lt;/h2&gt;

&lt;p&gt;Plan hints (&lt;code&gt;/*+ ... */&lt;/code&gt;) such as Leading, HashJoin, NestLoop, IndexScan, SeqScan, and Rows allow you to override the optimizer. Use them only for short‑term fixes or when the optimizer consistently chooses the wrong plan despite accurate statistics and proper indexes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="cm"&gt;/*+ Leading((c o)) HashJoin(c o) */&lt;/span&gt;
       &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pay_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;dim_customer&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales_order&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'VIP'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always follow up a hint with model and parameter improvements; don't let it become a permanent crutch.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Key Parameters and Slow Query Tracking
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;work_mem&lt;/strong&gt;: Controls memory for sorts and hash joins. Set it per session based on concurrency — too high risks memory exhaustion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;shared_buffers&lt;/strong&gt;: Database shared buffer size, critical for read‑heavy workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Statement tracking&lt;/strong&gt;: Configure &lt;code&gt;track_stmt_stat_level&lt;/code&gt; (full/slow), &lt;code&gt;log_min_duration_statement&lt;/code&gt; (threshold), and &lt;code&gt;enable_stmt_track&lt;/code&gt;. Retrieve slow queries with:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;dbe_perf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_global_slow_sql_by_timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s1"&gt;'2026-03-24 09:00:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'2026-03-24 09:10:00'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Resource Management with Cgroups and Resource Pools
&lt;/h2&gt;

&lt;p&gt;GBase 8c's resource management is built on Linux Cgroups, configured via &lt;code&gt;gs_cgroup&lt;/code&gt;. Resource pools isolate CPU, memory, and I/O for different workloads — online transactions, reports, ETL — preventing a single heavy query from starving the entire cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Choosing the Right Storage and Distribution
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row store (&lt;code&gt;orientation=row&lt;/code&gt;)&lt;/strong&gt;: Best for frequent point queries, updates, and short transactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column store (&lt;code&gt;orientation=column&lt;/code&gt;)&lt;/strong&gt;: Ideal for analytical scans and aggregations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replicated tables (&lt;code&gt;DISTRIBUTE BY replication&lt;/code&gt;)&lt;/strong&gt;: Small dimension tables that are joined frequently — eliminates cross‑node data movement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hash distribution (&lt;code&gt;DISTRIBUTE BY hash&lt;/code&gt;)&lt;/strong&gt;: Large fact tables, distributed on the most common JOIN key or high‑frequency access column.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Transaction detail: row store, hash distributed by order_id&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;txn_order&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;      &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_time&lt;/span&gt;    &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_status&lt;/span&gt;  &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;pay_amount&lt;/span&gt;    &lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orientation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DISTRIBUTE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Analytical summary: column store, hash distributed by customer_id&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;rpt_order_day&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;stat_date&lt;/span&gt;      &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;    &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;city_id&lt;/span&gt;        &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_cnt&lt;/span&gt;      &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pay_amount_sum&lt;/span&gt; &lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orientation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DISTRIBUTE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Small dimension: replicated&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dim_city&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;city_id&lt;/span&gt;    &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;city_name&lt;/span&gt;  &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;region_id&lt;/span&gt;  &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;DISTRIBUTE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;replication&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. A Systematic Tuning Workflow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Confirm the problem is reproducible and capture the business time window.&lt;/li&gt;
&lt;li&gt;Verify statement tracking settings and collect slow queries.&lt;/li&gt;
&lt;li&gt;Analyze the execution plan with &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; — focus on row estimates and operator choices.&lt;/li&gt;
&lt;li&gt;Update statistics to give the optimizer accurate data.&lt;/li&gt;
&lt;li&gt;Tune SQL, add indexes, or apply hints as a short‑term measure.&lt;/li&gt;
&lt;li&gt;For peak‑time issues, examine resource pools, Cgroups, memory, and buffer cache as a whole.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Building a reliable &lt;strong&gt;gbase database&lt;/strong&gt; performance baseline means keeping statistics fresh, understanding how the optimizer thinks, aligning storage models with actual workloads, and establishing clear resource boundaries. This layered approach prevents the common cycle of reactive, single‑query patches and delivers consistent performance at scale.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
    </item>
    <item>
      <title>GBase 8a Operations in Practice: Load Monitoring, Audit Logs, and Memory Tuning</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Sat, 20 Jun 2026 12:22:16 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-operations-in-practice-load-monitoring-audit-logs-and-memory-tuning-5781</link>
      <guid>https://dev.clauneck.workers.dev/michaelfv/gbase-8a-operations-in-practice-load-monitoring-audit-logs-and-memory-tuning-5781</guid>
      <description>&lt;p&gt;This guide covers three core areas of daily GBase 8a operations: tracking data loads and collecting error details, configuring audit logs and analysing slow queries, and hierarchically tuning memory parameters. It also provides a standard daily and weekly inspection checklist for your &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Data Load Monitoring
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 Load Methods
&lt;/h3&gt;

&lt;p&gt;GBase 8a supports two main load methods: &lt;code&gt;gload&lt;/code&gt; for large‑scale offline imports (recommended), and &lt;code&gt;LOAD DATA INFILE&lt;/code&gt; for single‑file loads with MySQL‑like syntax.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.2 Checking Load Progress
&lt;/h3&gt;

&lt;p&gt;Monitor running and historical loads through system tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Currently executing load tasks&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;loaded_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TIMESTAMPDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SECOND&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;elapsed_sec&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_task&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'RUNNING'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'PENDING'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Last 50 load history records&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loaded_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TIMESTAMPDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SECOND&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;duration_sec&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_task&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.3 Retrieving the Last Load Task ID
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;gbase_loader_last_task_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query error details with that ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_error_log&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;task_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'your_task_id'&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.4 Error Data Collection
&lt;/h3&gt;

&lt;p&gt;Enable error collection in the gcluster configuration file (&lt;code&gt;gbase.cnf&lt;/code&gt;) for production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;gbase_loader_logs_collect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;ON&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.5 Load Performance Parameters
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Recommended&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gcluster_loader_max_data_processors&lt;/td&gt;
&lt;td&gt;gcluster&lt;/td&gt;
&lt;td&gt;Max concurrent load processing threads&lt;/td&gt;
&lt;td&gt;CPU cores / 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gcluster_loader_min_chunk_size&lt;/td&gt;
&lt;td&gt;gcluster&lt;/td&gt;
&lt;td&gt;Chunk size sent to gnode (bytes)&lt;/td&gt;
&lt;td&gt;64 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gbase_loader_parallel_degree&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Parallel write threads on gnode&lt;/td&gt;
&lt;td&gt;4 – 8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gbase_loader_buffer_count&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Number of load buffers&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Audit Log Configuration and Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Enabling Audit Logs
&lt;/h3&gt;

&lt;p&gt;Configure in both gcluster and gnode &lt;code&gt;gbase.cnf&lt;/code&gt; files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;audit_log&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;ON&lt;/span&gt;
&lt;span class="py"&gt;log_output&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;FILE          # or TABLE&lt;/span&gt;
&lt;span class="py"&gt;long_query_time&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;5             # seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.2 Querying When log_output = TABLE
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Recent slow queries&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lock_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rows_sent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rows_examined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SUBSTR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sql_snippet&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slow_log&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Top SQL patterns by average execution time&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SUBSTR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sql_pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;exec_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows_examined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_rows_scanned&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slow_log&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;DATE_SUB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;sql_pattern&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;avg_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.3 Node‑Level SQL Execution Time Monitoring
&lt;/h3&gt;

&lt;p&gt;Set the threshold in gcluster &lt;code&gt;gbase.cnf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;gcluster_dql_statistic_threshold&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;3000   # milliseconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query per‑node execution times:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;sql_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exec_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rows_processed&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dql_statistic&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;exec_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;sql_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exec_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If one node's &lt;code&gt;exec_time&lt;/code&gt; is far higher than the others, suspect data skew or a hardware issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Memory Parameter Tuning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Memory Hierarchy
&lt;/h3&gt;

&lt;p&gt;The gnode process memory is governed by &lt;code&gt;gbase_memory_pct_target&lt;/code&gt; (percentage of system memory). Beneath it, heap memory is split into &lt;code&gt;gbase_heap_data&lt;/code&gt; (normal operations) and &lt;code&gt;gbase_heap_large&lt;/code&gt; (heavy operations like sorts/joins), plus multiple operation‑level buffers.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Key Parameters
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Typical Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gbase_memory_pct_target&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;% of system memory for gnode&lt;/td&gt;
&lt;td&gt;70 – 80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gbase_heap_data&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Heap for normal ops (MB)&lt;/td&gt;
&lt;td&gt;30% of total memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gbase_heap_large&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Heap for large ops (MB)&lt;/td&gt;
&lt;td&gt;30% of total memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gbase_buffer_hj&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Hash Join buffer (MB)&lt;/td&gt;
&lt;td&gt;512 – 2048&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gbase_buffer_sort&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Sort buffer (MB)&lt;/td&gt;
&lt;td&gt;512 – 2048&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gbase_buffer_hgrby&lt;/td&gt;
&lt;td&gt;gnode&lt;/td&gt;
&lt;td&gt;Hash Group By buffer (MB)&lt;/td&gt;
&lt;td&gt;512 – 1024&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3.3 Example Configuration (64 GB Physical RAM Node)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# gnode gbase.cnf
&lt;/span&gt;&lt;span class="py"&gt;gbase_memory_pct_target&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;75      # gnode uses 48 GB&lt;/span&gt;
&lt;span class="py"&gt;gbase_heap_data&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;16384   # 16 GB&lt;/span&gt;
&lt;span class="py"&gt;gbase_heap_large&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;16384   # 16 GB&lt;/span&gt;
&lt;span class="py"&gt;gbase_buffer_hj&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2048&lt;/span&gt;
&lt;span class="py"&gt;gbase_buffer_hgrby&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1024&lt;/span&gt;
&lt;span class="py"&gt;gbase_buffer_distgrby&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1024&lt;/span&gt;
&lt;span class="py"&gt;gbase_buffer_sort&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1024&lt;/span&gt;
&lt;span class="py"&gt;gbase_buffer_rowset&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;256&lt;/span&gt;
&lt;span class="py"&gt;gbase_buffer_result&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;512&lt;/span&gt;
&lt;span class="py"&gt;gbase_buffer_insert&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;256&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.4 Monitoring Actual Memory Usage
&lt;/h3&gt;

&lt;p&gt;Enable session memory statistics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;_gbase_session_memory_stat&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query per‑session memory consumption:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory_used&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;memory_mb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUBSTR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sql_snippet&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_memory_stat&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;memory_used&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.5 Hot Data Eviction Under Memory Pressure
&lt;/h3&gt;

&lt;p&gt;In gnode &lt;code&gt;gbase.cnf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;_gbase_cache_drop_hot_data&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;_gbase_cache_drop_unlock_cell_count&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1000&lt;/span&gt;
&lt;span class="py"&gt;_gbase_cache_drop_delay_time&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Connection and Timeout Quick Reference
&lt;/h2&gt;

&lt;p&gt;Key timeout parameters in gcluster &lt;code&gt;gbase.cnf&lt;/code&gt; include &lt;code&gt;connect_timeout&lt;/code&gt; (handshake), read/write timeouts, internal reconnect settings, &lt;code&gt;gcluster_lock_timeout&lt;/code&gt;, and &lt;code&gt;Wait_timeout&lt;/code&gt; for idle sessions. JDBC clients should also specify &lt;code&gt;connectTimeout&lt;/code&gt; and &lt;code&gt;socketTimeout&lt;/code&gt; in the URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Daily Operations Checklist
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Daily checks&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- 1. Node status&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;node_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_heartbeat_time&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node_info&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;node_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 2. Yesterday's load failure rate&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_tasks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'FAILED'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;failed_tasks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error_rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_error_rows&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_task&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CURDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="n"&gt;failed_tasks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;total_error_rows&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 3. Long‑running active transactions&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;processlist&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Weekly checks&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- 4. Data volume balance across nodes&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;node_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;data_gb&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;segment_info&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;node_name&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;data_gb&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 5. Top 10 slow queries of the week&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SUBSTR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_sec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_sec&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;gclusterdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slow_log&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;DATE_SUB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CURDATE&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;avg_sec&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Regularly inspecting system tables under &lt;code&gt;gclusterdb&lt;/code&gt; helps you spot potential issues before they impact your &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>operations</category>
    </item>
  </channel>
</rss>
