We recently had a problem in production where the weblogic servers randomly ( once or twice per day) became extremely slow (basically unusable) for a period of time before recovering. We discovered that the server was running low in memory causing a perpetual garbage collection cycle to occur. This begin occurring after loading a large amount of data (several million rows) that was transferred from our old application. Up until this point the system had been humming along fine. An analysis of the thread dump showed that during this time of crisis the server was looping in the following code.
oracle.jbo.server.QueryCollection.hasNextInResultSet(QueryCollection.java:4611)
oracle.jbo.server.ViewObjectImpl.hasNextForCollection(ViewObjectImpl.java:6899)F
oracle.jbo.server.QueryCollection.hasNext(QueryCollection.java:4579)
oracle.jbo.server.QueryCollection.populateRow(QueryCollection.java:3553)
oracle.jbo.server.QueryCollection.fetch(QueryCollection.java:3387)
oracle.jbo.server.QueryCollection.get(QueryCollection.java:2188)
oracle.jbo.server.ViewRowSetImpl.getRow(ViewRowSetImpl.java:5016)
oracle.jbo.server.ViewRowSetImpl.getRow(ViewRowSetImpl.java:3242)
oracle.jbo.server.ViewObjectImpl.activateTransients(ViewObjectImpl.java:18328)
oracle.jbo.server.ViewObjectImpl.activateTransients(ViewObjectImpl.java:18289)
oracle.jbo.server.ViewObjectImpl.activateState(ViewObjectImpl.java:18512)
oracle.jbo.server.ViewObjectImpl.activateState(ViewObjectImpl.java:18407)
oracle.jbo.server.ViewRowSetIteratorImpl.activateIteratorState(ViewRowSetIteratorImpl.java:4025)
oracle.jbo.server.ViewRowSetImpl.activateIteratorState(ViewRowSetImpl.java:7235)
oracle.jbo.server.ViewObjectImpl.activateIteratorState(ViewObjectImpl.java:18742)
oracle.jbo.server.ApplicationModuleImpl.activateVOs(ApplicationModuleImpl.java:8172)
oracle.jbo.server.ApplicationModuleImpl.doActivateState(ApplicationModuleImpl.java:7918)
oracle.jbo.server.ApplicationModuleImpl.doActivateAMState(ApplicationModuleImpl.java:7884)
oracle.jbo.server.Serializer.activate(Serializer.java:296)
oracle.jbo.server.DBSerializer.activateRootAM(DBSerializer.java:330)
oracle.jbo.server.ApplicationModuleImpl.activateState(ApplicationModuleImpl.java:6207)
The looping was occuring in the getRow method of ViewRowSetImpl. The stack trace also shows that the looping is occuring as the result of a activation event to restore a previously passivated application module. This demonstrates the importance of testing code with the jbo.ampooling=false setting in order to test for passivation problems. In this case however we still might not have notice this problem because it requires both a large table and a specific sequence of user actions in order to reproduce.
Here is the getRow method from ViewRowSetImpl. The section in red was where the looping was occuring
public Row getRow(Key key)
{
Row row;
if (key != null)
{
Object keyHandle = key.getRowHandle();
if (keyHandle != null && mQRef.getSignature() == key.getSignature())
{
row = getRowFromHandle(keyHandle);
if (row != null && key.equals(row.getKey()))
{
return row;
}
}
}
//if there are rows in the collection, then match those before calling execute.
if (getFetchedRowCount() > 0)
{
for (int j = 0;
(row = getRow(j)) != null; j++)
{
if (key == null || row.getKey().equals(key))
{
return row;
}
}
}
//make sure that the query is executed before calling getRow(int);
if (!isExecuted())
{
synchronized(getSyncLock())
{
execute(false /*force*/, false /*notify*/); //no need for eventing here.
}
}
for
(int j = 0; (row = getRow(j)) != null; j++)
{
if (key == null
|| row.getKey().equals(key))
{
return row;
}
}
return null;
}
Here is what I did to duplicate the problem on my local jdeveloper instance.
- We have an entity based view object to a table that contains a large (2+ million) row set. The view object was unbounded (no bind variables)
- The range size was set to 15.
- There is an iterator binding on the page bound to this view that is bound to a popup form for adding or inserting one row of data.
- When the popup displays the unbounded query executes because the iterator executes when first used (the default setting). This may not be good for database performance (which was taking ~3 seconds) but it only retrieves the 1st range of 15 rows from the database into memory and this did not normally cause a problem so was not noticed.
- The data was entered on the form and a button “save and add another” is clicked which commits the current row and advances the form to a new row ready for input.
- At this point we have a view object that contains 1 range of rows (15) that were returned by the orignal query plus the newly inserted row that we just committed (16 rows in total).
- Now to duplicate the problem we force a passivation of this users app module. We did this by setting the pool size to 1 and opening up the same page in another browser window.
- We then cancel out of the form popup. Before the form closes it presents an “Are you sure?” dialog and it must re-activate the app module it is bound to. During activation the listed getRow() method in ViewRowSetImpl is called in order to find and sync up all of the 16 rows that were in the view when it was passivated. Since there was a newly inserted row it may be located near the end of the result set that would be returned by the unbounded query. (there was no sort order) The loop in red is the problem. The ADF framework was searching for this row by retrieving one row at a time from the database. It basically had to go through all 2 million rows and pull them into memory before it would find it. On my local debug machine this was happening extremely slow. (About 50 per second) and I would run out of memory before it finished (and there was only 390,000 rows in our dev database)
This could have been instantaneous if ADF would run a
query with the ID to retrieve it, instead of searching for it row by row. I really think this is a serious design flaw but the feedback from Oracle was that this was working as designed. I don't understand why the activation process event needs to find the rows from the database. It's purpose is to restore the state of the view object to what it was prior to passivation so why not just read the passivation record and be done with it? This problem could occur on any large table that is used for both scrolling through the data and inserting rows using the same view object.
Bottom Line and Lessons Learned.
- If you have a view object that is only used for inserting rows, make sure that you set the Tuning parameter to Retrieve from the Database "No Rows" or "At Most One Row"
- If you do need to display a table that can filter and scroll over a very large result set and also insert new rows, use a read-only view object for displaying the data and a separate Entity based view object for inserting new rows.
- If the view object had had a sort order that caused newly inserted rows to appear at the beginning of the result set, we wouldn't have noticed this problem.
- This shows how critical it is to set the tuning parameters appropriately when dealing with large tables. In our case this was a hidden time bomb waiting to happen.
No comments:
Post a Comment