If you’re not using hugepages, you’re doing it wrong!

Well, there’s been a bit of a delay in with my planned testing of dbVisit Replicate and Oracle GoldenGate for zero-downtime upgrades. So, I’ll be (hopefully) getting back to that within a couple of weeks.

Meanwhile, I recently ran across a discussion on the Oracle OTN Community forums, asking about performance and hugepages configuration, here in the Oracle Database – General Questions Forum.

I think my answer bears repeating, so, here is a slightly modified version:

First, I’m going to take a strong position on hugepages. I’m going to go as far as to say, for any non-trivial SGA size, if you’re not using hugepages, you’re doing it wrong. There are three main points to consider.

First, when allocating a large SGA, each page needs an entry in the page table. On x86-64 (a 64-bit architecture), each PTE (page table entry) is 8 bytes. So, let’s assume you are allocating a 20GB SGA. Standard shared memory pages are 4kb. So, to allocate 20GB of SGA, you need 5,242,880 4kb pages. Each of those pages requires an 8-byte entry in the page table. So, that’s 40MB of page table entries.

The second point is that, with standard 4k pages, each process that attaches to the SGA needs it’s own copy of the page table. So, if you have, say, 200 dedicated server processes, that implies 8000MB of page table entries. So, for a 20GB SGA, you have an extra 8GB (approximately) of overhead.

Compare that to the same 20GB SGA, implemented with hugepages. 20GB, using 2MB sized hugepages, means 10,240 pages, and correspondingly, 10,240 page table entries. Again, a page table entry is 8 bytes. So, for the same 20GB SGA, the page table overhead is only 80kb. However, the other big savings, is that with hugepages, the page table is shared. So, you only need that one copy of the 80kb page table for your 20GB SGA, regardless of how many dedicated server processes attach to the SGA!

Finally, hugepages are locked into memory, and cannot swap. So, that also can add significantly to stability.

So, depending on the size of your SGA, the number of dedicated server processes, and the total amount of RAM on the server, it’s not hard to imagine that without hugepages configured correctly, you could suffer from significant performance problems.

As I said before, if you’re not using hugepages, you’re doing it wrong!


8 comments on “If you’re not using hugepages, you’re doing it wrong!

  1. Jeremy says:

    I didn’t know that the page table was shared with hugepages and not shared with standard pages. That’s interesting. This sent me on a google hunt but my searches are turning up a difficult maze of info so far – do you have any links on hand where I could find out more about hugepages and shared page tables?

    • mbobak says:

      Hi Jeremy,

      One more thing:

      I reached out to an Oracle contact, Greg Marsden. Greg is “Senior Director, Kernel and Sustaining Engineering for Linux and Virtualization at Oracle”, so, I consider him an authoritative source.

      Here’s his reply to me (quoted with permission):

      “Yes, this is true, and it’s a HUGE benefit of hugepages.

      I’m not finding any authoritative docs on the subject, but simple experimentation and review of the size of pagetables in /proc/meminfo will confirm.

      The shared pagedtable thing is true for all revisions of hugepages as far back as I can recall.”

      So, I think we can confidently say that page tables are indeed shared with hugepages.



  2. mbobak says:

    Hi Jeremy,

    Sorry it took me a while to respond, I was trying to come up with a definitive answer.

    I read somewhere that page tables are shared with hugepages, but sadly, it was some time ago, and I don’t know where I learned that. I also did a lot of Googling, but was not successful.

    I was at a RedHat Enterprise User Group meeting tonight, where one of the speakers was from RH and talked about performance tuning, and specifically hugepages and transparent hugepages.

    After the talk, I specifically asked him this question, and he assured me that page tables are shared with hugepages (and not shared without hugepages configured), but he was unable to provide me with a specific reference.

    I just thought of something else. I have a small (10GB SGA) database, that I have exclusive access to. So, I set it up with hugepages, and, with just the background sessions connected, I can see the following:

    Using ‘ipcs’, I can see that, with no users and just background processes connected, there are 28 processes attached to the SGA:
    [oracle@dtgogg301 ~]$ ipcs -m

    —— Shared Memory Segments ——–
    key shmid owner perms bytes nattch status
    0x00000000 1835008 oracle 640 4096 0
    0x00000000 1867777 oracle 640 4096 0
    0xc52fc564 1900546 oracle 640 4096 0
    0x00000000 2359299 oracle 640 67108864 28
    0x00000000 2392068 oracle 640 10200547328 28
    0x99be5428 2424837 oracle 640 2097152 28

    (The nattch column shows how many processes are attached to the shared memory segments.)

    Looking at the ‘PageTables’ value in /proc/meminfo, I can see:
    [oracle@dtgogg301 ~]$ grep PageTables /proc/meminfo
    PageTables: 34148 kB

    Since this a 64-bit kernel, each PTE (Page Table Entry) is an 8-byte pointer.
    So, we can calculate:
    [oracle@dtgogg301 ~]$ bc
    bc 1.06.95
    Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
    This is free software with ABSOLUTELY NO WARRANTY.
    For details type `warranty’.

    So, that’s 4,370,944 hugepages, or, about:

    Making that number more readable: 9,166,533,951,488

    Now, the SGA is 10GB, or, more specifically:
    SQL> show sga

    Total System Global Area 10,222,108,672 bytes
    Fixed Size 2,237,128 bytes
    Variable Size 3,389,001,016 bytes
    Database Buffers 6,811,549,696 bytes
    Redo Buffers 19,320,832 bytes

    So, it doesn’t quite work out, but, I think that may be because Linux is ‘lazy’ and only has page table entries for memory that has actually been mapped?

    However, the point is, we have that number of page table entries, even though there are 28 processes attached to the SGA.

    This alone seems to be convincing evidence that the pagetable entries are *not* copied on a per process basis, but are shared.

    So, I don’t have an authoritative document stating it, but the test above would seem to imply it’s true….

  3. […] And in case you are wondering what all this fuss is about and why I do not simply want to live with good old standard shared memory, let Mark Bobak convince you that if you are not using Huge Pages, you are doing it wrong. […]

  4. […] If you're not using Hugepages, you're doing it Wrong! […]

  5. […] If you’re not using HugePages, you’re doing it wrong!  (Mark J. Bobak) […]

  6. can hugepages setting allowed on single instance EBS suite ? (db+ebs on single node). Will it interfere with Application functionality any how ?

    • mbobak says:

      Hi Bilal,
      Sorry for the late reply. I just noticed your comment. In cases of eBS with DB_eBS on the same server, there should be no proble m using hugepages for the DB. As long as you allocate the correct number of hugepages (use Oracle’s hugepage_settings.sh script available on MOS), there should be no problem. Just keep in mind, memory that is allocated as hugepages, is not avaialbe for any other use.

      So, be careful of how much memory you allocate, but yes, it should work fine.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s