Wednesday, September 16, 2015

ZFS compression results in workload starvation, partially ameliorated by async_write_max_active

I have a particular dataset that consists of old backup files. Rather than storing them in .tar.gz archives, where the contents remain buried (and the files cannot be culled with useful tools such as fdupes), I elected to create a zfs dataset that uses gzip-9 compression.

However, on my system, I noticed that writing to the compression=gzip-9 zfs dataset resulted in starvation to other I/O processes  -- in particular, read processes -- rendering the system unusable.

In order to analyze the situation, I used this dtrace script to analyze different classes of I/O operations on my system.

See below for the sysctl tuneable that I modified.

vfs.zfs.vdev.async_write_max_active=10 vfs.zfs.vdev.async_write_max_active=3
  Delete                                            
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@@@                                      2        
            1024 |@@@@                                     3        
            2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        24       
            4096 |                                         0        

  Flush                                             
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |@@@@                                     4        
            2048 |@                                        1        
            4096 |@@@@                                     4        
            8192 |@@@@                                     4        
           16384 |@@@                                      3        
           32768 |@@@@@@@@@@@@@@@                          14       
           65536 |@@@@@@@@                                 7        
          131072 |                                         0        

  Write                                             
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |                                         96       
              64 |@                                        531      
             128 |@@                                       735      
             256 |@@                                       1067     
             512 |@@@@@@@@@@@@@@@@@@@@@@@@@@               11639    
            1024 |@@@                                      1573     
            2048 |@@                                       878      
            4096 |@@                                       785      
            8192 |@                                        612      
           16384 |                                         183      
           32768 |                                         25       
           65536 |                                         16       
          131072 |                                         13       
          262144 |                                         3        
          524288 |                                         0        

  Read                                              
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |                                         12       
             128 |                                         48       
             256 |@                                        124      
             512 |@@@@@@@@@@                               2046     
            1024 |@@@@                                     906      
            2048 |@@@@@@                                   1244     
            4096 |@@@@@@@                                  1523     
            8192 |@@@@@@                                   1228     
           16384 |@@@@                                     774      
           32768 |@                                        173      
           65536 |                                         26       
          131072 |                                         22       
          262144 |                                         5        
          524288 |                                         0
  Delete                                            
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@@                                       1        
            1024 |@@@@@@@@@@@@@@@@@@@@                     12       
            2048 |@@@@@@@@                                 5        
            4096 |@@                                       1        
            8192 |@@@@@@@@                                 5        
           16384 |                                         0        

  Flush                                             
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |@@@@@@@@@                                5        
            2048 |@@@@                                     2        
            4096 |@@@@@                                    3        
            8192 |@@@@@@@@@                                5        
           16384 |                                         0        
           32768 |@@@@@@@                                  4        
           65536 |@@@@@                                    3        
          131072 |                                         0        

  Write                                             
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@                                        468      
              64 |@@@@@@                                   3741     
             128 |@@@@@@                                   3627     
             256 |@@@@@                                    3234     
             512 |@@@@@@@@@@@@@@@@@                        10520    
            1024 |@                                        916      
            2048 |@                                        608      
            4096 |@                                        476      
            8192 |@                                        476      
           16384 |                                         246      
           32768 |                                         80       
           65536 |                                         54       
          131072 |                                         11       
          262144 |                                         5        
          524288 |                                         0        

  Read                                              
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |                                         18       
             128 |@                                        70       
             256 |@                                        153      
             512 |@@@@@@@@                                 890      
            1024 |@@@@                                     445      
            2048 |@@@@@                                    594      
            4096 |@@@@@@                                   667      
            8192 |@@@@@@                                   709      
           16384 |@@@@@                                    609      
           32768 |@@@                                      381      
           65536 |@                                        130      
          131072 |                                         26       
          262144 |                                         10       
          524288 |                                         0
 
                              avg latency      stddev        iops  throughput
Write                               1892us      8057us       302/s    33416k/s
Delete                              2303us       732us         0/s       34k/s
Read                                7747us     15904us       135/s    17011k/s
Flush                              42742us     37072us         0/s        0k/s

                               avg latency      stddev        iops  throughput
 Write                               1521us      8316us       407/s    30107k/s
 Delete                              3557us      3001us         0/s       41k/s
 Read                               14255us     27405us        78/s     8844k/s
 Flush                              25081us     32016us         0/s        0k/s
In particular, it seems that the average latency of I/O Flush dropped considerably.

A more exhaustive study under more carefully controlled conditions seems like a reasonable next step.

No comments: