Whilst there is lots of "PGA memory operation" events, they still only contributed to 0.36 seconds of time, whereas out of the 285.98 total time, 268.30 of that (ie, almost all of it) was hard core CPU burn, and minimal recursive calls.
So there is nothing "out of the ordinary" besides of course the heavy CPU cost.
I think you'll need to have a chat to Support on this one - you may have hit a boundary case.
It might also be an interesting exercise to break the package into smaller chunks and see what the impact is, eg
Before
package body PKG is
procedure P1(params) is
begin
lots of code code code
end;
procedure P2(params) is
begin
lots of code code code
end;
...
...
procedure P998(params) is
begin
lots of code code code
end;
procedure P999(params) is
begin
lots of code code code
end;
end;
After
package body PKG is
procedure P1(params) is
begin
code code code
end;
procedure P2(params) is
begin
code code code
end;
...
...
procedure P888(params) is
begin
child_pkg.p998(params)
end;
procedure P999(params) is
begin
child_pkg.p999(params)
end;
end;
just so its split over several smaller packages.
I'm not necessarily suggesting that as a "solution" but in more terms of proving if its related to total package size or something else.