Openvasmd hangs ocassionally at 100% cpu when resuming a stopped task

gvm-9
sqlite

#1

I observed this behavior after upgrading from OpenVAS8 to OpenVAS9 (OpenVAS Manager 7.0.3).

Resuming of stopped tasks usually works as intended but sometimes it hangs at 100% cpu for hours.
I examined a few such hangs in gdb and their backtraces look similar:

#0  0x00007fb408c34b42 in do_fcntl (fd=8, cmd=6, arg=0x7ffe43cc1160) at ../sysdeps/unix/sysv/linux/fcntl.c:39
#1  0x00007fb408c34c19 in __libc_fcntl (fd=<optimized out>, cmd=<optimized out>) at ../sysdeps/unix/sysv/linux/fcntl.c:88
#2  0x00007fb4086782ad in ?? () from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
#3  0x00007fb408678436 in ?? () from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
#4  0x00007fb40869c522 in ?? () from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
#5  0x00007fb4086db49c in ?? () from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
#6  0x00007fb4086e1ec7 in sqlite3_step () from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
#7  0x00000000004e71d9 in sql_exec_internal (retry=retry@entry=1, stmt=0xb21c000)
#8  0x000000000047effc in sqlv (retry=retry@entry=1, sql=sql@entry=0x572000 "DELETE FROM report_counts WHERE report = %llu   AND \"user\" = %llu   AND override = %d   AND min_qod = %d", 
    args=args@entry=0x7ffe43cc16e8)
#9  0x000000000047f178 in sql (sql=sql@entry=0x572000 "DELETE FROM report_counts WHERE report = %llu   AND \"user\" = %llu   AND override = %d   AND min_qod = %d")
#10 0x00000000004b184d in report_cache_counts (report=232, clear_original=1, clear_overridden=1, users_where=<optimized out>)
#11 0x00000000004b1ae6 in trim_partial_report (report=<optimized out>)
#12 0x0000000000472813 in run_task_prepare_report (task=2, report_id=0x7ffe43cc1e20, from=<optimized out>, run_status=<optimized out>, last_stopped_report=0x7ffe43cc18b0)
#13 0x0000000000477447 in run_otp_task (task=2, scanner=1, from=1, report_id=0x7ffe43cc1e20)
#14 0x000000000047886c in run_task (task_id=0x8 <error: Cannot access memory at address 0x8>, task_id@entry=0xb230c50 "7bce3e90-fff9-4c81-b78b-04eac24c931a", report_id=0x7ffe43cc1e20, from=1)
#15 0x000000000047938a in resume_task (task_id=0xb230c50 "7bce3e90-fff9-4c81-b78b-04eac24c931a", report_id=0x7ffe43cc1e20)
#16 0x0000000000444c30 in omp_xml_handle_end_element (context=0x8, element_name=0x3affb80 <command_data> "P\f#\v", user_data=0xb2189d0, error=0xffffffffffffffff)
#17 0x00007fb40839a2a7 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x00007fb40839b105 in g_markup_parse_context_parse () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x000000000046e881 in process_omp_client_input ()

I stepped a little bit in gdb and it seems the execution never returns from sql_exec_internal() as it keeps calling in a loop (sqlite3_step () always returns SQLITE_BUSY). At the same time I can usually resume other tasks without problem (that I guess call the same function).


#2

The current used category is/was https://community.greenbone.net/c/gce (Description: About the Community Edition (GCE) category) which is about the downloadable ready-to use virtual machine.

Based on your posted issue it sees you have an own installation either build from source or installed via 3rdparty repositories. For such installations the https://community.greenbone.net/c/gse (Description: About the Source Edition (GSE) category) needs to be chosen.

I have moved this question to the correct category for now.


#3

Have you checked for any database locks ? If SQLITE Busy returns it might point you to your issue.


#4

Not sure what would be the best way to test it?
Check processes that have file descriptor open on tasks.db and try to debug these?


#5

Maybe use PSQL and not SQLite there is a full stack and howtos how to debug PSQL.