3. Host and domain obfuscation

3.1. Host and domain obfuscation overview

SOSCleaner has completely re-written the host and domain obfuscation engine for the 0.4.0 release. In previous releases, all hostnames were obfuscated to obfuscateddomain.com. This could be confusing when troubleshooting issues across multiple domains.

3.1.1. Filing hostname bugs

Please open hostname obfuscation bugs using the hostname obfuscation bug template. This will ensure the proper labels are applied and we can move forward quickly with your issue.

3.2. Domain database

Domains that are obfuscation are maintained in self.dn_db, a dictionary, in {'original_domain1': 'obfuscated_domain1',...} format. Domains are obfuscated in addition to full hostnames because the domain in a configuration or in a log often makes a big difference in fixing or finding an issue.

3.2.1. Adding domains to the domain database

If obfuscating an sosreport, the FQDN of the report host is split between host and domain, and the domain is automatically added to self.dn_db.

Additional domains can be slated for obfuscation using the -d parameter on the command line. Multiple domains can be added by using multiple -d parameters, for example:

# soscleaner -d example.com -d foo.com -d someotherdomain.com mysosreport.tar.xz

would add example.com, foo.com, and someotherdomain.com to self.domains.

3.2.2. Default domains

In addition to the host’s domainname and any additional domains, soscleaner automatically adds redhat.com and localhost.localdomain to self.dn_db.

3.2.3. Processing domains

After the desired entries are added to self.domains using the above processes, self._domains2db() is called by, self.clean_report() to add all the entries to self.dn_db with their obfuscated counterparts.

3.2.4. Obfuscating subdomains

Each line in each file processed by soscleaner is processed by self._clean_line(), which calls self._sub_hostname(). This function uses a regular expression to match anything in the current line that is potentially a domain.

potential_hostnames = re.findall(r'\b[a-zA-Z0-9-\.]{1,200}\.[a-zA-Z]{1,63}\b', line)

The matches in potential_hostnames are validated againt the list of known domains using self._validate_domainname(). If the potential domain turns out to be a subdomain of a known domain, the newly matched subdomain is added to self.dn_db using self._dn2db(). For example, if example.com is a known domain, and a potential match is apps.example.com, apps.example.com will be added to the domain database and used for obfuscation going forward.

3.3. Hostname database

One of the primary functions of SOSCleaner is to obfuscate hostnames when they’re found in a file beyond just the hostname of the server itself. To aid in troubleshooting, domain names are obfuscated separately. This is to keep the integrity of the data, even though the data is being obfuscated. Obfuscated hostnames are tracked in self.hn_db, a dictionary, using the {'original_host1': 'obfuscated_host1',...} format.

3.3.1. Default hostnames

If processing a sosreport the hostname of the sosreport host is added to self.hn_db.

3.3.2. Adding hostnames

When a hostname is found that is a member of a known domain in self.dn_db, it is obfuscated as hostX.obfuscatedomainY.com, with X being an incremented number equal to the current total of found hosts, self.hostname_count. Y is equal to the unique value assigned to the corresponding domain.

3.3.3. Host short name

There are many occurrences of the host-only part of the server’s hostname in an sosreport and log files in general. These are obfuscated explicitly in self._sub_hostname(). When an soscleaner run is started, the host’s hostname is stored as self.hostname. This is explicitly searched for in each line by soscleaner.

3.3.4. Short domains

There are a few short domain names that soscleaner obfuscates. By default, localhost and localdomain are added to self.short_domains, and are explicitly searched out and replaced in each line.

Short domains aren’t editable

Currently there isn’t a way to add additional entries to self.short_domains.

3.4. Hostname and Domainname reports

At the conclusion of a soscleaner run, the domain and hostname mappings are recorded in self.report_dir/<SESSION_ID>-hostname.csv and self.report_dir/<SESSION_ID>-dn.csv, respectively. If an SOSCleaner session fails to complete, these reports aren’t created.