When any kernel routine wants memory it ends up calling <#2052#>
get_free_page()<#2052#>. This is at a lower level than <#2053#> kmalloc()<#2053#> (in
fact <#2054#> kmalloc()<#2054#> uses <#2055#> get_free_page()<#2055#> when it needs more
memory).
<#2056#> get_free_page()<#2056#> takes one parameter, a priority. Possible
values are <#2057#> GFP_BUFFER<#2057#>, <#2058#> GFP_KERNEL<#2058#>, and <#2059#>
GFP_ATOMIC<#2059#>. It takes a page off of the <#2060#> free_page_list<#2060#>,
updates <#2061#> mem_map<#2061#>, zeroes the page and returns the physical
address of the page (note that <#2062#> kmalloc()<#2062#> returns a physical
address. The logic of the mm depends on the identity map between
logical and physical addresses).
That itself is simple enough. The problem, of course, is that the <#2063#>
free_page_list<#2063#> may be empty. If you did not request an atomic
operation, at this stage, you enter into the realm of page stealing
which we'll go into in a moment. As a last resort (and for atomic
requests) a page is torn off from the <#2064#> secondary_page_list<#2064#> (as
you may have guessed, when pages are freed, the <#2065#>
secondary_page_list<#2065#> gets filled up first).
The actual manipulation of the <#2066#> page_list<#2066#>s and <#2067#> mem_map<#2067#>
occurs in this mysterious macro called <#2068#> REMOVE_FROM_MEM_QUEUE()<#2068#>
which you probably never want to look into. Suffice it to say that
interrupts are disabled. <#2294#> [I think that this should be explained
here. It is not <#2069#> that<#2069#> hard...]<#2294#>
Now back to the page stealing bit. <#2070#> get_free_page()<#2070#> calls <#2071#>
try_to_free_page()<#2071#> which repeatedly calls <#2072#> shrink_buffers()<#2072#>
and <#2073#> swap_out()<#2073#> in that order until it is successful in freeing a
page. The priority is increased on each successive iteration so that
these two routines run through their page stealing loops more often.
Here's one run through <#2074#> swap_out()<#2074#>:
- Run through the process table and get a swappable task say <#2076#> Q.<#2076#>
- Find a user page table (not RESERVED) in <#2077#> Q<#2077#>'s space.
- For each <#2078#> page<#2078#> in the table <#2295#> try_to_swap_out( <#2079#>
page<#2079#> )<#2295#>.
- Quit when a page is freed.
Note that <#2081#> swap_out()<#2081#> (called by <#2082#> try_to_free_page()<#2082#>)
maintains static variables so it may resume the search where it left
off on the previous call.
<#2083#> try_to_swap_out()<#2083#> scans the page tables of
all user processes and enforces the stealing policy:
- Do not fiddle with RESERVED pages.
- Age the page if it is marked accessed (1 bit).
- Don't tamper with recently acquired pages
(<#2085#> last_free_pages[]<#2085#>).
- Leave dirty pages with <#2086#> map_counts<#2086#> ;SPM_gt; 1 alone.
- Decrement the <#2087#> map_count<#2087#> of clean pages.
- Free clean pages if they are unmapped.
- Swap dirty pages with a <#2088#> map_count<#2088#> of 1.
Of these actions, 6 and 7 will stop the process as they result in the
actual freeing of a physical page. Action 5 results in one of the
processes losing an unshared clean page that was not accessed recently
(decrement <#2090#> Q<#2090#><#2091#> -;SPM_gt;rss<#2091#>) which is not all that bad, but the
cumulative effects of a few iterations can slow down a process
considerably. At present, there are 6 iterations, so a page shared by
6 processes can get stolen if it is clean.
Page table entries are updated and the TLB invalidated. <#2092#> [Wonder about the
latter. It seems unnecessary since accessed pages aren't offed and there
is a walk through many page tables between iterations ... may be in case
an interrupt came along and wanted the most recently axed page?]<#2092#>
The actual work of freeing the page is done by <#2093#> free_page()<#2093#>, the
complement of <#2094#> get_free_page()<#2094#>. It ignores RESERVED pages,
updates <#2095#> mem_map<#2095#>, then frees the page and updates the <#2096#>
page_list<#2096#>s if it is unmapped. For swapping (in 6 above), <#2097#>
write_swap_page()<#2097#> gets called and does nothing remarkable from the
memory management perspective.
The details of <#2098#> shrink_buffers()<#2098#> would take us too far afield.
Essentially it looks for free buffers, then writes out dirty buffers,
then goes at busy buffers and calls <#2099#> free_page()<#2099#> when its able
to free all the buffers on a page.
Note that page directories and page tables along with RESERVED pages
do not get swapped, stolen or aged. They are mapped in the process
page directory through reserved page tables. They are freed only on
exit from the process.