Get_report/get_reports yiedling XML syntax and parser errors

Hi,

On some reports, when using a get_reports/get_report from python-gvm, XML errors are raised on connections.py, specifically when feeding data:
self._parser.feed(data)

Here is an example when calling get_report(report_id=…, details=True)
https://pastebin.com/raw/NkcpjgfS

Playing with buffer size to pinpoint corrupted data I got some examples:
xmlParseCharRef - invalid xmlChar value 4
with data such as:

b'#xe;Content-Length 1565 AB!<?xml version="1.0" encoding="ISO-8859-1"?>\n<'
or
b' Content-Length 1565 AB!<?xml version="1.0" encoding="ISO-8859-1"?>\n&'
or
b't;1.0&qu'
or
b'ersion="1.0'

XMLSyntaxError - 'Document is empty, line 1, column 1’
b'lt;!--\n Licensed to the Apache Software Foundation (ASF) under one or more\n contributor license agreements. See the NOTICE fil'
or
b'er the Apache License, Version 2.0\n (the "License"); you may not use this file except in compliance with\n the Licens'

I cherry picked greenbone/gvmd#965 as it seemed to have inner xmls but the issue continued. It looks like gvmd is not feeding well escaped data.
I am posting this here as I am not sure this is python-gvm or gvmd itself related.

Any thoughts?

Thank you

GVM versions

  • [GVM Libraries 11.0.0 ]
  • [OpenVAS 7.0.0 ]
  • [OSPd 2.0.0 ]
  • [ospd-openvas 1.0.0 ]
  • [Greenbone Vulnerability Manager 9.0.0 ]
  • [Greenbone Security Assistant 9.0.0 ]
  • [python-gvm 1.2.0 ]
1 Like

I ended up removing all unicode hex characters codes such as . These, I believe, come from headers generated by specific NVTs and are xml invalid.
i.e. on connections.py:

...
clean_data = data.decode("utf-8", errors="ignore")
clean_data = clean_data.replace('&', '')
self._parser.feed(clean_data.encode())
...

It is a bit of a hammer solution but I dont need those headers and can’t think of a better solution.