<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://prashant-sharma.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://prashant-sharma.com/" rel="alternate" type="text/html" /><updated>2026-04-15T11:08:07+00:00</updated><id>https://prashant-sharma.com/feed.xml</id><title type="html">Prashant Sharma</title><subtitle>Technical blog about Apache spark, Prestodb presto and Apache Iceberg. Learn more about me on the {{ site.url }}/about </subtitle><entry><title type="html">Simple performance patterns but massive gains - Iceberg on Prestodb.</title><link href="https://prashant-sharma.com/iceberg/prestodb/2026/03/30/perf-tune-iceberg-on-prestodb.html" rel="alternate" type="text/html" title="Simple performance patterns but massive gains - Iceberg on Prestodb." /><published>2026-03-30T00:24:31+00:00</published><updated>2026-03-30T00:24:31+00:00</updated><id>https://prashant-sharma.com/iceberg/prestodb/2026/03/30/perf-tune-iceberg-on-prestodb</id><content type="html" xml:base="https://prashant-sharma.com/iceberg/prestodb/2026/03/30/perf-tune-iceberg-on-prestodb.html"><![CDATA[<p>Recently, a issue came to me with the user complaining their code takes 43 minutes to update 1000 rows. My immediate impression was, may be they have a really complex update query. A typical update query syntax is :</p>

<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">UPDATE</span> <span class="k">table_name</span> <span class="k">SET</span> <span class="p">[</span> <span class="k">column</span> <span class="o">=</span> <span class="n">expression</span> <span class="p">[,</span> <span class="p">...</span> <span class="p">]</span> <span class="p">]</span> <span class="p">[</span> <span class="k">WHERE</span> <span class="n">condition</span> <span class="p">]</span></code></pre></figure>

<p>More on: <a href="https://prestodb.io/docs/current/sql/update.html">prestodb update docs</a></p>

<p>This is designed to update several rows based on certain WHERE clause condition being met. It turned out user was using a update query per each row in the target table, of size 1000 rows. That means they had about 1000 update queries. Next since the table was backed by Iceberg format stored on HDFS storage - this had a completely different meaning. Neither of the Iceberg or HDFS are designed for such usage patterns. Often users coming from RDBMS background, tend to have similar expectations from system designed for handling data storage sizes beyond peta bytes. No wonder sales guys takes the adventurous customer through the new and fancy product and customer buys it whether or not they are going to use those features. Their hope is, if it’s faster for very large data then it will be faster for my small amount of data as well, and since it is growing it will be wise to migrate to such a system sooner than later.</p>

<p>Anyways, for each executed update Iceberg was generating a snapshot and all the associated metadata. This resulted in huge metadata bloat and if we go on updating each rows as a single update sql, we would result a huge amount of Iceberg metadata. Iceberg is not designed for this kind of usage pattern, though there is compaction that can compact the metadata and prune the unnecessary snapshots by calling stored procedures <a href="https://iceberg.apache.org/docs/latest/spark-procedures/">Iceberg stored procedures</a>. Calling these stored procedure and achieving exactly what we want requires a strategy of it’s own (topic for another time).</p>

<p>Their code for updating those rows looked like the following.</p>

<figure class="highlight"><pre><code class="language-java" data-lang="java">  <span class="nc">String</span> <span class="n">updateSQL</span> <span class="o">=</span> <span class="nc">String</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">"UPDATE \"%s\".\"%s\".\"%s\" SET \"C2\" = ?, \"C3\" = ?, \"C4\" = ? WHERE \"C1\" = ?"</span><span class="o">,</span> <span class="s">"iceberg"</span><span class="o">,</span> <span class="s">"perf_test"</span><span class="o">,</span> <span class="s">"perf_test_tab"</span><span class="o">);</span>
  <span class="n">preparedStatement</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="na">prepareStatement</span><span class="o">(</span><span class="n">updateSQL</span><span class="o">);</span>
  <span class="c1">// update batch</span>
  <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="no">TOTAL_RECORDS</span><span class="o">;</span> <span class="o">++</span><span class="n">i</span><span class="o">)</span> <span class="o">{</span>
      <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setString</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="s">"UpdatedValue_"</span> <span class="o">+</span> <span class="n">i</span><span class="o">);</span>
      <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setLong</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="mi">1000L</span> <span class="o">+</span> <span class="o">(</span><span class="kt">long</span><span class="o">)</span> <span class="n">i</span><span class="o">);</span>
      <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setBigDecimal</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="o">(</span><span class="k">new</span> <span class="nc">BigDecimal</span><span class="o">(</span><span class="s">"123.45"</span><span class="o">)).</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="nc">BigDecimal</span><span class="o">(</span><span class="n">i</span><span class="o">)));</span>
      <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setInt</span><span class="o">(</span><span class="mi">4</span><span class="o">,</span> <span class="n">i</span><span class="o">);</span>
      <span class="n">preparedStatement</span><span class="o">.</span><span class="na">addBatch</span><span class="o">();</span>
      <span class="k">if</span> <span class="o">(</span><span class="n">i</span> <span class="o">%</span> <span class="mi">20</span> <span class="o">==</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
          <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"  Added "</span> <span class="o">+</span> <span class="n">i</span> <span class="o">+</span> <span class="s">" statements to batch..."</span><span class="o">);</span>
      <span class="o">}</span>
  <span class="o">}</span>

  <span class="n">startExecTime</span> <span class="o">=</span> <span class="nc">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
  <span class="n">results</span> <span class="o">=</span> <span class="n">preparedStatement</span><span class="o">.</span><span class="na">executeBatch</span><span class="o">();</span>
  <span class="n">execTime</span> <span class="o">=</span> <span class="nc">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">()</span> <span class="o">-</span> <span class="n">startExecTime</span><span class="o">;</span></code></pre></figure>

<p>This was consuming a whooping 43 minutes of time for the reasons already explained.</p>

<p>If we want to achieve update of all the rows (or large number of rows) in a table. Just insert the data to be updated into a new table first and then use MERGE table as follows.</p>

<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">presto</span><span class="o">&gt;</span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">iceberg</span><span class="p">.</span><span class="n">perf_test</span><span class="p">.</span><span class="n">perf_test_tab</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">c1</span> <span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
 <span class="n">c1</span> <span class="o">|</span>        <span class="n">c2</span>        <span class="o">|</span>  <span class="n">c3</span>  <span class="o">|</span>   <span class="n">c4</span>   
<span class="c1">----+------------------+------+--------</span>
  <span class="mi">1</span> <span class="o">|</span> <span class="n">InsertedValue_10</span> <span class="o">|</span> <span class="mi">1001</span> <span class="o">|</span> <span class="mi">124</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">2</span> <span class="o">|</span> <span class="n">InsertedValue_11</span> <span class="o">|</span> <span class="mi">1002</span> <span class="o">|</span> <span class="mi">125</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">3</span> <span class="o">|</span> <span class="n">InsertedValue_12</span> <span class="o">|</span> <span class="mi">1003</span> <span class="o">|</span> <span class="mi">126</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">4</span> <span class="o">|</span> <span class="n">InsertedValue_13</span> <span class="o">|</span> <span class="mi">1004</span> <span class="o">|</span> <span class="mi">127</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">5</span> <span class="o">|</span> <span class="n">InsertedValue_14</span> <span class="o">|</span> <span class="mi">1005</span> <span class="o">|</span> <span class="mi">128</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">6</span> <span class="o">|</span> <span class="n">InsertedValue_15</span> <span class="o">|</span> <span class="mi">1006</span> <span class="o">|</span> <span class="mi">129</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">7</span> <span class="o">|</span> <span class="n">InsertedValue_16</span> <span class="o">|</span> <span class="mi">1007</span> <span class="o">|</span> <span class="mi">130</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">8</span> <span class="o">|</span> <span class="n">InsertedValue_17</span> <span class="o">|</span> <span class="mi">1008</span> <span class="o">|</span> <span class="mi">131</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">9</span> <span class="o">|</span> <span class="n">InsertedValue_18</span> <span class="o">|</span> <span class="mi">1009</span> <span class="o">|</span> <span class="mi">132</span><span class="p">.</span><span class="mi">45</span> 
 <span class="mi">10</span> <span class="o">|</span> <span class="n">InsertedValue_19</span> <span class="o">|</span> <span class="mi">1010</span> <span class="o">|</span> <span class="mi">133</span><span class="p">.</span><span class="mi">45</span> 
<span class="p">(</span><span class="mi">10</span> <span class="k">rows</span><span class="p">)</span>

<span class="n">Query</span> <span class="mi">20260410</span><span class="n">_160854_00140_zb4hc</span><span class="p">,</span> <span class="n">FINISHED</span><span class="p">,</span> <span class="mi">1</span> <span class="n">node</span>
<span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="p">:</span><span class="mi">8080</span><span class="o">/</span><span class="n">ui</span><span class="o">/</span><span class="n">query</span><span class="p">.</span><span class="n">html</span><span class="o">?</span><span class="mi">20260410</span><span class="n">_160854_00140_zb4hc</span>
<span class="n">Splits</span><span class="p">:</span> <span class="mi">27</span> <span class="n">total</span><span class="p">,</span> <span class="mi">27</span> <span class="n">done</span> <span class="p">(</span><span class="mi">100</span><span class="p">.</span><span class="mi">00</span><span class="o">%</span><span class="p">)</span>
<span class="p">[</span><span class="n">Latency</span><span class="p">:</span> <span class="n">client</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">192</span><span class="n">ms</span><span class="p">,</span> <span class="n">server</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">183</span><span class="n">ms</span><span class="p">]</span> <span class="p">[</span><span class="mi">1</span><span class="p">.</span><span class="mi">01</span><span class="n">K</span> <span class="k">rows</span><span class="p">,</span> <span class="mi">13</span><span class="n">KB</span><span class="p">]</span> <span class="p">[</span><span class="mi">5</span><span class="p">.</span><span class="mi">52</span><span class="n">K</span> <span class="k">rows</span><span class="o">/</span><span class="n">s</span><span class="p">,</span> <span class="mi">71</span><span class="p">.</span><span class="mi">3</span><span class="n">KB</span><span class="o">/</span><span class="n">s</span><span class="p">]</span>

<span class="n">presto</span><span class="o">&gt;</span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">iceberg</span><span class="p">.</span><span class="n">perf_test</span><span class="p">.</span><span class="n">perf_test_tab2</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">c1</span> <span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
 <span class="n">c1</span> <span class="o">|</span>       <span class="n">c2</span>        <span class="o">|</span>  <span class="n">c3</span>  <span class="o">|</span>   <span class="n">c4</span>   
<span class="c1">----+-----------------+------+--------</span>
  <span class="mi">1</span> <span class="o">|</span> <span class="n">UpdatedValue_10</span> <span class="o">|</span> <span class="mi">1001</span> <span class="o">|</span> <span class="mi">124</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">2</span> <span class="o">|</span> <span class="n">UpdatedValue_11</span> <span class="o">|</span> <span class="mi">1002</span> <span class="o">|</span> <span class="mi">125</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">3</span> <span class="o">|</span> <span class="n">UpdatedValue_12</span> <span class="o">|</span> <span class="mi">1003</span> <span class="o">|</span> <span class="mi">126</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">4</span> <span class="o">|</span> <span class="n">UpdatedValue_13</span> <span class="o">|</span> <span class="mi">1004</span> <span class="o">|</span> <span class="mi">127</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">5</span> <span class="o">|</span> <span class="n">UpdatedValue_14</span> <span class="o">|</span> <span class="mi">1005</span> <span class="o">|</span> <span class="mi">128</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">6</span> <span class="o">|</span> <span class="n">UpdatedValue_15</span> <span class="o">|</span> <span class="mi">1006</span> <span class="o">|</span> <span class="mi">129</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">7</span> <span class="o">|</span> <span class="n">UpdatedValue_16</span> <span class="o">|</span> <span class="mi">1007</span> <span class="o">|</span> <span class="mi">130</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">8</span> <span class="o">|</span> <span class="n">UpdatedValue_17</span> <span class="o">|</span> <span class="mi">1008</span> <span class="o">|</span> <span class="mi">131</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">9</span> <span class="o">|</span> <span class="n">UpdatedValue_18</span> <span class="o">|</span> <span class="mi">1009</span> <span class="o">|</span> <span class="mi">132</span><span class="p">.</span><span class="mi">45</span> 
 <span class="mi">10</span> <span class="o">|</span> <span class="n">UpdatedValue_19</span> <span class="o">|</span> <span class="mi">1010</span> <span class="o">|</span> <span class="mi">133</span><span class="p">.</span><span class="mi">45</span> 
<span class="p">(</span><span class="mi">10</span> <span class="k">rows</span><span class="p">)</span>

<span class="n">Query</span> <span class="mi">20260410</span><span class="n">_160912_00142_zb4hc</span><span class="p">,</span> <span class="n">FINISHED</span><span class="p">,</span> <span class="mi">1</span> <span class="n">node</span>
<span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="p">:</span><span class="mi">8080</span><span class="o">/</span><span class="n">ui</span><span class="o">/</span><span class="n">query</span><span class="p">.</span><span class="n">html</span><span class="o">?</span><span class="mi">20260410</span><span class="n">_160912_00142_zb4hc</span>
<span class="n">Splits</span><span class="p">:</span> <span class="mi">27</span> <span class="n">total</span><span class="p">,</span> <span class="mi">27</span> <span class="n">done</span> <span class="p">(</span><span class="mi">100</span><span class="p">.</span><span class="mi">00</span><span class="o">%</span><span class="p">)</span>
<span class="p">[</span><span class="n">Latency</span><span class="p">:</span> <span class="n">client</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">197</span><span class="n">ms</span><span class="p">,</span> <span class="n">server</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">187</span><span class="n">ms</span><span class="p">]</span> <span class="p">[</span><span class="mi">1</span><span class="p">.</span><span class="mi">01</span><span class="n">K</span> <span class="k">rows</span><span class="p">,</span> <span class="mi">13</span><span class="n">KB</span><span class="p">]</span> <span class="p">[</span><span class="mi">5</span><span class="p">.</span><span class="mi">4</span><span class="n">K</span> <span class="k">rows</span><span class="o">/</span><span class="n">s</span><span class="p">,</span> <span class="mi">69</span><span class="p">.</span><span class="mi">7</span><span class="n">KB</span><span class="o">/</span><span class="n">s</span><span class="p">]</span>

<span class="n">presto</span><span class="o">&gt;</span> <span class="n">MERGE</span> <span class="k">INTO</span> <span class="n">iceberg</span><span class="p">.</span><span class="n">perf_test</span><span class="p">.</span><span class="n">perf_test_tab</span> <span class="k">as</span> <span class="n">t1</span>
     <span class="o">-&gt;</span> <span class="k">USING</span> <span class="n">iceberg</span><span class="p">.</span><span class="n">perf_test</span><span class="p">.</span><span class="n">perf_test_tab2</span> <span class="k">as</span> <span class="n">t2</span>
     <span class="o">-&gt;</span> <span class="k">ON</span> <span class="n">t1</span><span class="p">.</span><span class="n">c1</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c1</span>
     <span class="o">-&gt;</span> <span class="k">WHEN</span> <span class="n">MATCHED</span> <span class="k">THEN</span>
     <span class="o">-&gt;</span> <span class="k">UPDATE</span> <span class="k">SET</span>
     <span class="o">-&gt;</span>  <span class="n">c2</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c2</span>
     <span class="o">-&gt;</span> <span class="p">,</span> <span class="n">c3</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c3</span>
     <span class="o">-&gt;</span> <span class="p">,</span> <span class="n">c4</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c4</span>
     <span class="o">-&gt;</span> <span class="k">WHEN</span> <span class="k">NOT</span> <span class="n">MATCHED</span> <span class="k">THEN</span>
     <span class="o">-&gt;</span>     <span class="k">INSERT</span> <span class="p">(</span><span class="n">c1</span><span class="p">,</span> <span class="n">c2</span><span class="p">,</span> <span class="n">c3</span><span class="p">,</span> <span class="n">c4</span><span class="p">)</span>
     <span class="o">-&gt;</span>     <span class="k">VALUES</span> <span class="p">(</span><span class="n">t2</span><span class="p">.</span><span class="n">c1</span><span class="p">,</span> <span class="n">t2</span><span class="p">.</span><span class="n">c2</span><span class="p">,</span> <span class="n">t2</span><span class="p">.</span><span class="n">c3</span><span class="p">,</span> <span class="n">t2</span><span class="p">.</span><span class="n">c4</span><span class="p">);</span>
<span class="n">MERGE</span><span class="p">:</span> <span class="mi">1000</span> <span class="k">rows</span>

<span class="n">Query</span> <span class="mi">20260410</span><span class="n">_160929_00143_zb4hc</span><span class="p">,</span> <span class="n">FINISHED</span><span class="p">,</span> <span class="mi">1</span> <span class="n">node</span>
<span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="p">:</span><span class="mi">8080</span><span class="o">/</span><span class="n">ui</span><span class="o">/</span><span class="n">query</span><span class="p">.</span><span class="n">html</span><span class="o">?</span><span class="mi">20260410</span><span class="n">_160929_00143_zb4hc</span>
<span class="n">Splits</span><span class="p">:</span> <span class="mi">102</span> <span class="n">total</span><span class="p">,</span> <span class="mi">102</span> <span class="n">done</span> <span class="p">(</span><span class="mi">100</span><span class="p">.</span><span class="mi">00</span><span class="o">%</span><span class="p">)</span>
<span class="p">[</span><span class="n">Latency</span><span class="p">:</span> <span class="n">client</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">0</span><span class="p">:</span><span class="mi">01</span><span class="p">,</span> <span class="n">server</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">0</span><span class="p">:</span><span class="mi">01</span><span class="p">]</span> <span class="p">[</span><span class="mi">2</span><span class="p">.</span><span class="mi">02</span><span class="n">K</span> <span class="k">rows</span><span class="p">,</span> <span class="mi">26</span><span class="p">.</span><span class="mi">1</span><span class="n">KB</span><span class="p">]</span> <span class="p">[</span><span class="mi">3</span><span class="p">.</span><span class="mi">08</span><span class="n">K</span> <span class="k">rows</span><span class="o">/</span><span class="n">s</span><span class="p">,</span> <span class="mi">39</span><span class="p">.</span><span class="mi">8</span><span class="n">KB</span><span class="o">/</span><span class="n">s</span><span class="p">]</span>

<span class="n">presto</span><span class="o">&gt;</span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">iceberg</span><span class="p">.</span><span class="n">perf_test</span><span class="p">.</span><span class="n">perf_test_tab</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">c1</span> <span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
 <span class="n">c1</span> <span class="o">|</span>       <span class="n">c2</span>        <span class="o">|</span>  <span class="n">c3</span>  <span class="o">|</span>   <span class="n">c4</span>   
<span class="c1">----+-----------------+------+--------</span>
  <span class="mi">1</span> <span class="o">|</span> <span class="n">UpdatedValue_10</span> <span class="o">|</span> <span class="mi">1001</span> <span class="o">|</span> <span class="mi">124</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">2</span> <span class="o">|</span> <span class="n">UpdatedValue_11</span> <span class="o">|</span> <span class="mi">1002</span> <span class="o">|</span> <span class="mi">125</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">3</span> <span class="o">|</span> <span class="n">UpdatedValue_12</span> <span class="o">|</span> <span class="mi">1003</span> <span class="o">|</span> <span class="mi">126</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">4</span> <span class="o">|</span> <span class="n">UpdatedValue_13</span> <span class="o">|</span> <span class="mi">1004</span> <span class="o">|</span> <span class="mi">127</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">5</span> <span class="o">|</span> <span class="n">UpdatedValue_14</span> <span class="o">|</span> <span class="mi">1005</span> <span class="o">|</span> <span class="mi">128</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">6</span> <span class="o">|</span> <span class="n">UpdatedValue_15</span> <span class="o">|</span> <span class="mi">1006</span> <span class="o">|</span> <span class="mi">129</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">7</span> <span class="o">|</span> <span class="n">UpdatedValue_16</span> <span class="o">|</span> <span class="mi">1007</span> <span class="o">|</span> <span class="mi">130</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">8</span> <span class="o">|</span> <span class="n">UpdatedValue_17</span> <span class="o">|</span> <span class="mi">1008</span> <span class="o">|</span> <span class="mi">131</span><span class="p">.</span><span class="mi">45</span> 
  <span class="mi">9</span> <span class="o">|</span> <span class="n">UpdatedValue_18</span> <span class="o">|</span> <span class="mi">1009</span> <span class="o">|</span> <span class="mi">132</span><span class="p">.</span><span class="mi">45</span> 
 <span class="mi">10</span> <span class="o">|</span> <span class="n">UpdatedValue_19</span> <span class="o">|</span> <span class="mi">1010</span> <span class="o">|</span> <span class="mi">133</span><span class="p">.</span><span class="mi">45</span> 
<span class="p">(</span><span class="mi">10</span> <span class="k">rows</span><span class="p">)</span>

<span class="n">Query</span> <span class="mi">20260410</span><span class="n">_160940_00144_zb4hc</span><span class="p">,</span> <span class="n">FINISHED</span><span class="p">,</span> <span class="mi">1</span> <span class="n">node</span>
<span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="p">:</span><span class="mi">8080</span><span class="o">/</span><span class="n">ui</span><span class="o">/</span><span class="n">query</span><span class="p">.</span><span class="n">html</span><span class="o">?</span><span class="mi">20260410</span><span class="n">_160940_00144_zb4hc</span>
<span class="n">Splits</span><span class="p">:</span> <span class="mi">28</span> <span class="n">total</span><span class="p">,</span> <span class="mi">28</span> <span class="n">done</span> <span class="p">(</span><span class="mi">100</span><span class="p">.</span><span class="mi">00</span><span class="o">%</span><span class="p">)</span>
<span class="p">[</span><span class="n">Latency</span><span class="p">:</span> <span class="n">client</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">236</span><span class="n">ms</span><span class="p">,</span> <span class="n">server</span><span class="o">-</span><span class="n">side</span><span class="p">:</span> <span class="mi">228</span><span class="n">ms</span><span class="p">]</span> <span class="p">[</span><span class="mi">2</span><span class="p">.</span><span class="mi">01</span><span class="n">K</span> <span class="k">rows</span><span class="p">,</span> <span class="mi">21</span><span class="p">.</span><span class="mi">8</span><span class="n">KB</span><span class="p">]</span> <span class="p">[</span><span class="mi">8</span><span class="p">.</span><span class="mi">81</span><span class="n">K</span> <span class="k">rows</span><span class="o">/</span><span class="n">s</span><span class="p">,</span> <span class="mi">95</span><span class="p">.</span><span class="mi">5</span><span class="n">KB</span><span class="o">/</span><span class="n">s</span><span class="p">]</span>

<span class="n">presto</span><span class="o">&gt;</span> </code></pre></figure>

<p>It took just 1 second to do what was taking 43 minutes previously, and the metadata is clean too - no compaction to worry about.</p>

<p>Here we used the following merge command:</p>

<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="n">MERGE</span> <span class="k">INTO</span> <span class="n">iceberg</span><span class="p">.</span><span class="n">perf_test</span><span class="p">.</span><span class="n">perf_test_tab</span> <span class="k">as</span> <span class="n">t1</span>
<span class="k">USING</span> <span class="n">iceberg</span><span class="p">.</span><span class="n">perf_test</span><span class="p">.</span><span class="n">perf_test_tab2</span> <span class="k">as</span> <span class="n">t2</span>
<span class="k">ON</span> <span class="n">t1</span><span class="p">.</span><span class="n">c1</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c1</span>
<span class="k">WHEN</span> <span class="n">MATCHED</span> <span class="k">THEN</span>
<span class="k">UPDATE</span> <span class="k">SET</span>
 <span class="n">c2</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c2</span>
<span class="p">,</span> <span class="n">c3</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c3</span>
<span class="p">,</span> <span class="n">c4</span> <span class="o">=</span> <span class="n">t2</span><span class="p">.</span><span class="n">c4</span>
<span class="k">WHEN</span> <span class="k">NOT</span> <span class="n">MATCHED</span> <span class="k">THEN</span>
    <span class="k">INSERT</span> <span class="p">(</span><span class="n">c1</span><span class="p">,</span> <span class="n">c2</span><span class="p">,</span> <span class="n">c3</span><span class="p">,</span> <span class="n">c4</span><span class="p">)</span>
    <span class="k">VALUES</span> <span class="p">(</span><span class="n">t2</span><span class="p">.</span><span class="n">c1</span><span class="p">,</span> <span class="n">t2</span><span class="p">.</span><span class="n">c2</span><span class="p">,</span> <span class="n">t2</span><span class="p">.</span><span class="n">c3</span><span class="p">,</span> <span class="n">t2</span><span class="p">.</span><span class="n">c4</span><span class="p">);</span></code></pre></figure>

<p>This says, for every row in table t2, if column c1 is equal to column c1 in table t1, perform update by setting all the columns from table t2 to table t1. If a row exists in table t2 and not in table t1, then just insert it in t1. That’s it. More on Prestodb’s merge command, <a href="https://prestodb.io/docs/current/sql/merge.html">Prestodb merge docs</a>.</p>

<p>A similar phenomenon can be observed if we insert a row in iceberg table using insert statements per row. For example:</p>

<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">String</span> <span class="n">insertSQL</span> <span class="o">=</span> <span class="nc">String</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">"INSERT INTO \"%s\".\"%s\".\"%s\" VALUES (?, ? , ?, ?)"</span><span class="o">,</span> <span class="s">"iceberg"</span><span class="o">,</span> <span class="s">"perf_test"</span><span class="o">,</span> <span class="s">"perf_test_tab"</span><span class="o">);</span>
<span class="n">preparedStatement</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="na">prepareStatement</span><span class="o">(</span><span class="n">insertSQL</span><span class="o">);</span>
<span class="c1">// insert batch</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="no">TOTAL_RECORDS</span><span class="o">;</span> <span class="o">++</span><span class="n">i</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setInt</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="n">i</span><span class="o">);</span>
    <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setString</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="s">"InsertedValue_"</span> <span class="o">+</span> <span class="n">i</span><span class="o">);</span>
    <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setLong</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">1000L</span> <span class="o">+</span> <span class="o">(</span><span class="kt">long</span><span class="o">)</span> <span class="n">i</span><span class="o">);</span>
    <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setBigDecimal</span><span class="o">(</span><span class="mi">4</span><span class="o">,</span> <span class="o">(</span><span class="k">new</span> <span class="nc">BigDecimal</span><span class="o">(</span><span class="s">"123.45"</span><span class="o">)).</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="nc">BigDecimal</span><span class="o">(</span><span class="n">i</span><span class="o">)));</span>
    <span class="n">preparedStatement</span><span class="o">.</span><span class="na">addBatch</span><span class="o">();</span>
<span class="o">}</span>
<span class="kt">long</span> <span class="n">startExecTime</span> <span class="o">=</span> <span class="nc">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
<span class="kt">int</span><span class="o">[]</span> <span class="n">results</span> <span class="o">=</span> <span class="n">preparedStatement</span><span class="o">.</span><span class="na">executeBatch</span><span class="o">();</span>
<span class="kt">long</span> <span class="n">execTime</span> <span class="o">=</span> <span class="nc">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">()</span> <span class="o">-</span> <span class="n">startExecTime</span><span class="o">;</span>
<span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"✓ insert Batch of "</span> <span class="o">+</span> <span class="no">TOTAL_RECORDS</span> <span class="o">+</span> <span class="s">" executed in "</span> <span class="o">+</span> <span class="n">execTime</span> <span class="o">+</span> <span class="s">" ms"</span><span class="o">);</span></code></pre></figure>

<p>In this case, we would expect presto-jdbc to do something intelligent because we are sending insert’s in batch of preperad statements, however such a optimization is difficult to generalize across different connectors. A rdbms based connector will not need the same optimization that iceberg would and similarly, each insert could be for a different connector type too.</p>

<p>The above insert batch can be developed as follows:</p>

<figure class="highlight"><pre><code class="language-java" data-lang="java"> <span class="nc">String</span> <span class="n">insertSQL</span> <span class="o">=</span> <span class="s">"INSERT INTO \"%s\".\"%s\".\"%s\" VALUES (%d, 'first_row' , 100, 11.20)"</span><span class="o">;</span>
<span class="nc">String</span> <span class="n">valuesAddendum</span> <span class="o">=</span> <span class="s">", (%s, '%s', %s, %.2f)"</span><span class="o">;</span>
<span class="c1">// insert batch</span>
<span class="nc">Statement</span> <span class="n">statement1</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="na">createStatement</span><span class="o">();</span>
<span class="kt">long</span> <span class="n">startExecTime</span> <span class="o">=</span> <span class="nc">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="no">TOTAL_RECORDS</span><span class="o">;</span> <span class="n">i</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">200</span><span class="o">)</span> <span class="o">{</span>
    <span class="kt">long</span> <span class="n">startMicroBatchExecTime</span> <span class="o">=</span> <span class="nc">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
    <span class="nc">StringBuilder</span> <span class="n">sb</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">StringBuilder</span><span class="o">(</span><span class="nc">String</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="n">insertSQL</span><span class="o">,</span> <span class="s">"iceberg"</span><span class="o">,</span> <span class="s">"perf_test"</span><span class="o">,</span> <span class="s">"perf_test_tab"</span><span class="o">,</span> <span class="n">i</span><span class="o">));</span>
    <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">1</span><span class="o">;</span> <span class="n">j</span> <span class="o">&lt;</span> <span class="mi">200</span><span class="o">;</span> <span class="n">j</span><span class="o">++)</span> <span class="o">{</span>
        <span class="n">sb</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="nc">String</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="n">valuesAddendum</span><span class="o">,</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="o">,</span> <span class="s">"InsertedValue_"</span> <span class="o">+</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="o">,</span> <span class="mi">1000L</span> <span class="o">+</span> <span class="o">(</span><span class="kt">long</span><span class="o">)</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="o">,</span> <span class="o">(</span><span class="k">new</span> <span class="nc">BigDecimal</span><span class="o">(</span><span class="s">"123.45"</span><span class="o">)).</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="nc">BigDecimal</span><span class="o">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="o">))));</span>
    <span class="o">}</span>
    <span class="n">statement1</span><span class="o">.</span><span class="na">execute</span><span class="o">(</span><span class="n">sb</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span>
    <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"Batch no "</span> <span class="o">+</span> <span class="n">i</span> <span class="o">+</span> <span class="s">" time: "</span> <span class="o">+</span> <span class="o">(</span><span class="nc">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">()</span> <span class="o">-</span> <span class="n">startMicroBatchExecTime</span><span class="o">));</span>
<span class="o">}</span></code></pre></figure>

<p>This way we batch 200 insert rows as single SQL insert, and does not create a snapshot per row. This is 100x faster than inserting a single row at a time, completes in less than a second opposed to minutes. Incase you are wondering why do we have a batch of 200, might as well insert all 1000 rows in one go. That can be done in this case, but since there is a limit on size of a single SQL query, it cannot be done for every single case. Infact, 200 is not a magic number, a end user has to carefully choose this number such that they do not exceed SQL query size limit. This can be configured via <code class="language-plaintext highlighter-rouge">query.max-length</code> in prestodb’s <code class="language-plaintext highlighter-rouge">config.properties</code>.</p>]]></content><author><name></name></author><category term="Iceberg" /><category term="Prestodb" /><summary type="html"><![CDATA[Recently, a issue came to me with the user complaining their code takes 43 minutes to update 1000 rows. My immediate impression was, may be they have a really complex update query. A typical update query syntax is :]]></summary></entry></feed>