Ticket #115 (assigned enhancement)

Opened 3 years ago

Last modified 2 years ago

Support for 8-bit domain names

Reported by: anon Owned by: ahu
Priority: normal Milestone:
Component: component1 Version:
Severity: normal Keywords:
Cc:

Description

PowerDNS should support 8-bit domain names, i.e. domains containing bytes with a value >127.

Logically, the domain names stored in the backend database can be considered to be made up of Unicode characters (unless the backend database is unaware of character encoding, see below). The important question is what the desired wire encoding should be, i.e. to which character encoding domain names should be transformed when sending them on the wire or when comparing them to names received from the wire.

This wire encoding could be determined from a configuration setting (either in the global PDNS config file or perhaps even as a field in the domains/zones table -- well, maybe not). As a fall-back, the backend database's native storage encoding could be used. For example, on a PostgreSQL database, this information can be read from the "server_encoding" run-time parameter:

 julian=# SHOW server_encoding;
  server_encoding
 -----------------
  UTF8
 (1 row)

(The main reason for falling back on the database server encoding for the wire encoding would be that the admin probably intended to express domain names in the database encoding.)

However we definitely don't want to implement any recoding logic in PowerDNS, so this needs to be left to the database. Again, for example, PostgreSQL supports setting a "client_encoding" run-time parameter:

julian=# SET client_encoding='UTF8';
SET
julian=# SHOW client_encoding;
 client_encoding
-----------------
 UTF8
(1 row)

So, PDNS would read the desired wire encoding from its config file and instruct the database server to interpret input to be in that encoding and return query results in that encoding.

Suppose, for example, that PDNS is configured with a PostgreSQL backend and a desired wire encoding of UTF-8. PDNS would connect to the PostgreSQL database and set a "client_encoding" of "UTF8". Then, when a query for the domain name

\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF.example.com.

arrives, PDNS would pass that in a query to the PostgreSQL database. The database would interpret it as "こんにちは.example.com" (based on the configured client encoding) and execute the query.

I'm sorry that I cannot offer much expertise with regard to other databases or LDAP. They always should store their data in some defined encoding, but some may be totally ignorant towards character encoding issues. And even if they have a clue, they may not offer recoding of data. In those cases, it should be acceptable for PDNS to just treat the data as opaque 8-bit strings.

Finally, one might come to think that there is an issue when a DNS query has a domain name encoded in a different encoding than the PDNS server has stored. This is not actually an issue, though, because DNS doesn't know any concept of character encoding and treats everything as 8-bit opaque. Thus, if a query doesn't arrive in the same encoding in which the backend data is stored, then that 8-bit domain simply does not exist at the server and PDNS can safely return an empty answer (or whatever is appropriate).

Change History

Changed 3 years ago by anon

To clarify, when I said:

"Logically, the domain names stored in the backend database can be considered to be made up of Unicode characters (unless the backend database is unaware of character encoding, see below). The important question is what the desired wire encoding should be ..."

then I meant that as an abstraction of all the possible cases. The UTF-8 wire encoding case should be obvious from my description above. However if an admin wanted to have "real" binary domain names (in a Unicode/UTF-8 encoded backend database, otherwise there wouldn't be a problem anyway), then they should simply view (and store) each byte as a Unicode character with the byte's value as its code point (i.e. Unicode characters 0..255, which are by definition identical to the ISO-8859-1 AKA Latin-1 character set), and then use a wire encoding of "ISO-8859-1". That would allow for arbitrary binary domain names.

Changed 3 years ago by anon

Uh, somehow my e-mail address gets lost everytime I click "Preview". So here it is. Bert, can you please reset this ticket's submitter address to my e-mail address?

Changed 2 years ago by ahu

  • owner changed from somebody to ahu
  • status changed from new to assigned

This will be fixed in time, but it is not a priority right now.

Note: See TracTickets for help on using tickets.