Software pipelining works only on inner loops. What is more, inner loops with subroutine calls or complicated conditional branches do not software pipeline.
Look at statistics in the .s or .L file.
Your generated code may not have the operations you expected.
What operations did it need?
Look at the loop in the .s file.
Is it very different? Why?
Sometimes this is human error. (Improper code, or typo.)
Sometimes this is a compiler error.