Network Working Group J. Benet Internet-Draft Protocol Labs Intended status: Standards Track M. Sporny Expires: 11 April 2025 Digital Bazaar J. Caballero Interplanetary File System Foundation 8 October 2024 The Multihash Data Format draft-multiformats-multihash-latest Abstract Cryptographic hash functions often generate multiple output sizes and encodings. This variability makes it difficult for applications to examine a series of bytes and determine which hash function produced them, and thus such context is traditionally passed alongside the resulting bytes in defined protocols. Multihash inlines this context information so that it can travel and be translated more easily, decoupled from specific protocols. About This Document This note is to be removed before publishing as an RFC. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-multiformats-multihash/. Source for this draft and an issue tracker can be found at https://github.com/ipfs-tech/multiformats-multihash-v8. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 11 April 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Feedback 2. Introduction 3. Conventions and Definitions 4. The Multihash Fields 4.1. Multihash Core Data Types 4.1.1. Unsigned Variable Integer 4.2. Multihash Fields 4.2.1. Hash Function Identifier 4.2.2. Digest Length 4.2.3. Digest Value 4.3. A Multihash Example 5. Prior Art And Translation 5.1. Named Information Hash 5.1.1. Translation from multihash to named-information hash 5.2. Using Multihashes as Namespaced UUIDs 6. References 6.1. Normative References 6.2. Informative References Appendix A. Security Considerations Appendix B. Test Values B.1. SHA-1 Appendix C. IANA Considerations C.1. Initial Values for the Multihash Identifier Registry C.2. The 'mh' Digest Algorithm C.3. The 'mh' Named Information Hash Algorithm Acknowledgments Authors' Addresses 1. Feedback This specification is a joint work product of The IPFS Foundation (https://ipfs.tech/) and the W3C Credentials Community Group (https://w3c-ccg.github.io/). Feedback related to this specification should logged in the issue tracker (https://github.com/ipfs-tech/ multiformats-multihash-v8/issues) and/or be sent to Multiformats Mailing List at the IETF (mailto:multiformats@ietfa.amsl.com). 2. Introduction Multihash responds to evolving design patterns in systems which depend on cryptographically-secure hash functions, contributing to cryptographic agility and allowing for easier translation (e.g. across multiple wire formats) within a given system, and for ambient verifiability throughout a system, not just in the context of protocols. To facilitate self-describing hashes rather than context- bound ones, multihash inlines an identifier representing the hash function used (and its configuration or auxiliary inputs) as a prefix before the hash function output. This allows for cryptographic agility and provides a valuable building block to content-addressing systems and URI-safety mechanisms alike. 3. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 4. The Multihash Fields A multihash follows the TLV (type-length-value) pattern and consists of several fields composed of a combination of unsigned variable length integers and byte information. 4.1. Multihash Core Data Types The following section details the core data types used by the Multihash data format. 4.1.1. Unsigned Variable Integer A data type that enables one to express an unsigned integer of variable length. The format uses the Unsigned Little Endian Base 128 (ULEB128) encoding that was canonically defined in Appendix C of the DWARF Debugging Information Format standard, initially released in 1993, and further specified in 2011 by IRTF [RFC6256] as Self- Delimiting Numeric Values or SDNVs. As suggested by the name, this variable length encoding is only capable of representing unsigned integers. Further, while there is no theoretical maximum integer value that can be represented by the format, implementations MUST NOT encode more than nine (9) bytes giving a practical limit of integers in a range between 0 and 2^63 - 1. When encoding an unsigned variable integer, the unsigned integer is serialized seven bits at a time, starting with the least significant bits. The most significant bit in each output byte indicates if there is a continuation byte. It is not possible to express a signed integer with this data type. +=======+============================+======================+ | Value | Encoding (bits) | hexadecimal notation | +=======+============================+======================+ | 1 | 00000001 | 0x01 | +-------+----------------------------+----------------------+ | 127 | 01111111 | 0x7F | +-------+----------------------------+----------------------+ | 128 | 10000000 00000001 | 0x8001 | +-------+----------------------------+----------------------+ | 255 | 11111111 00000001 | 0xFF01 | +-------+----------------------------+----------------------+ | 300 | 10101100 00000010 | 0xAC02 | +-------+----------------------------+----------------------+ | 16384 | 10000000 10000000 00000001 | 0x808001 | +-------+----------------------------+----------------------+ Table 1 Implementations MUST restrict the size of the varint to a max of nine bytes (63 bits). In order to avoid memory attacks on the encoding, the aforementioned practical maximum length of nine bytes is used. There is no theoretical limit, and future specs can grow this number if it is truly necessary to have code or length values larger than 2^31. 4.2. Multihash Fields A multihash follows the TLV (type-length-value) pattern. 4.2.1. Hash Function Identifier The hash function identifier is an unsigned variable integer identifying the hash function. Possible values for this field are provided in The Multihash Identifier Registry (see IANA considerations below). 4.2.2. Digest Length The digest length is an unsigned variable integer counting the length of the digest in bytes. 4.2.3. Digest Value The digest value is the hash function digest with a length of exactly what is specified in the digest length, which is specified in bytes. 4.3. A Multihash Example For example, the following is an expression of a SHA2-256 hash in hexadecimal notation (spaces added for readability purposes): 0x12 20 41dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8 The first byte (0x12) specifies the SHA2-256 hash function. The second byte (0x20) specifies the length of the hash, which is 32 bytes. The rest of the data specifies the value of the output of the hash function. 5. Prior Art And Translation In IETF's corpus of normative protocols, there are three partial overlaps of problem space worth familiarizing oneself with to minimize collisions and confusions: * "Named Information Hash", specified in [RFC6920], defines an hierarchical URI scheme for content-identifiers, partitioned by enumerated hash functions. The NIH registry (https://www.iana.org/assignments/named-information/named- information.xhtml#hash-alg) at IANA contains all of these. * UUIDv5, aka "Namespaced UUIDs", defined in [RFC9562] section 5.5 (https://datatracker.ietf.org/doc/html/rfc9562#uuidv5), does the inverse, defining a universal namespace for one hash function, partitioned by the application of that function to multiple URI schemes (i.e. DNS names, valid URLs, etc.) * The IANA NIH registry (https://www.iana.org/assignments/named- information/named-information.xhtml#hash-alg) has a similar shape and governance mode to the IANA hashAlgorithm registry (https://www.iana.org/assignments/tls-parameters/tls- parameters.xhtml#tls-parameters-18) that TLS 1.2 implementations use to compactly signal supported hash+signature combinations. Since the former has different entries for some hash functions based on output length and the latter does not, the two registries are not alignable. However, given their different contexts, collisions between the two would not be a practical concern for users of either. 5.1. Named Information Hash The "Named Information Hash" URI scheme allows for minimally self- describing hash strings to serve as content-identifiers for arbitrary binary inputs. This lightweight identifier scheme is defined in [RFC6920] and the supported hash-context prefixes live in an IANA registry named "https://www.iana.org/assignments/named-information/ named-information.xhtml#hash-alg" (https://www.iana.org/assignments/ named-information/named-information.xhtml#hash-alg). Its syntactic similarity to HTTP headers and support for MIME content-types (https://datatracker.ietf.org/doc/html/rfc6920#section-3.1) makes it potentially useful for web use-cases, but use-cases are not constrained by URI scheme, only hinted at by the specification in sections 3 through 7. One limitation of the NIH system, as a binary format, is that its registry of headers is quite small, without space for tentative, experimental, or vendored entries. Some additional entries have been added without a binary tag at all, presumably for ASCII-only use. 5.1.1. Translation from multihash to named-information hash Some hash functions and output lengths specified in the Multihash registry below correspond to the few entries in the smaller Named Information Hash registry, leading to simple round-trip translations for multihashes produced by these dual-registered hash functions. Formatting a multihash with _any other_ multihash prefix as a Named Information Hash (only useful, of course, for consumers supporting both formats) is facilitated by a generic cross-registry tag for self-describing multihashes, first proposed to the NIH registry (https://www.iana.org/assignments/named-information/named- information.xhtml#hash-alg) by Appendix B (https://www.ietf.org/archive/id/draft-multiformats-multihash- 03.html#appendix-D.2) in the 2021 internet-draft (v3) of this same document. This also extends the NIH registry to the larger namespace of the multiformats registry. The translation is achieved thusly: 1. Strip the prefix bytes from the hash value and use the prefix bytes to identity the hash function used from the registry below. 2. If the multihash prefix corresponds to any tags in the NIH registry (https://www.iana.org/assignments/named-information/ named-information.xhtml#hash-alg): 1. translate multicodec tag to NIH tag, i.e., if 0x12 (sha2-256) in multicodec registry, then 0x01 (sha256) in named- information registry 2. transcode the hash value from "unsigned varint" to standard MSB binary 3. (for binary form:) reattach new prefix to transcoded hash value 4. (for ASCII form:) convert prefix to URL format, i.e., ni:///sha-256; for 0x01, and reattach to base64-encoded transcoded hash value 3. If multihash prefix does NOT map cleanly to a registered value in NIH registry (https://www.iana.org/assignments/named-information/ named-information.xhtml#hash-alg): 1. (for binary form:) prefix existing binary multihash with 0x42 to designate that what follows is a multicodec prefix followed by an ULEB128 hash value. 2. (for ASCII form:) convert the 0x42 prefix to URL format, i.e., ni:///mh; and then append a base64url, no-padding encoding of the entire binary multihash with prefix (and _without_ adding the additional base-64-url-no-padding prefix, u, if using a multibase (https://github.com/multiformats/multibase) library for this base-encoding). 5.2. Using Multihashes as Namespaced UUIDs Since the "Named Information Hash" URI scheme conforms to URL syntax (with or without an authority), each valid Named Information Hash URI can be assumed to be unique within the namespace of all valid URLs. As such, any ni:// URL (with or without an authority) can be hashed and used as a UUIDv5 (https://datatracker.ietf.org/doc/html/ rfc9562#uuidv5) in the URL namespace, i.e. 6ba7b811-9dad- 11d1-80b4-00c04fd430c8 (See section 6.6 (https://datatracker.ietf.org/doc/html/rfc9562#namespaces)). Since this approach relies on SHA-1, and discards all but the most significant 128 bits of the hash output, its security may not be adequate for all applications, as noted in the specification. Alternative ways of using a bounded namespace could include a novel namespace registration for UUIDv5, or a UUIDv8 approach, to content- address arbitrary information with namespaced UUID variants. 6. References 6.1. Normative References [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, DOI 10.17487/RFC6234, May 2011, . [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., and P. Hallam-Baker, "Naming Things with Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, . [RFC7693] Saarinen, M., Ed. and J. Aumasson, "The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC)", RFC 7693, DOI 10.17487/RFC7693, November 2015, . [RFC9562] Davis, K., Peabody, B., and P. Leach, "Universally Unique IDentifiers (UUIDs)", RFC 9562, DOI 10.17487/RFC9562, May 2024, . [FIPS202] "SHA-3 Standard, Permutation-Based Hash and Extendable- Output Functions", 1 August 2015, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 6.2. Informative References [RFC6256] Eddy, W. and E. Davies, "Using Self-Delimiting Numeric Values in Protocols", RFC 6256, DOI 10.17487/RFC6256, May 2011, . [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, . [DWARF] "DWARF Debugging Information Format", 1 December 2005, . Appendix A. Security Considerations TODO Security Appendix B. Test Values The input test data for all of the examples in this section is: Merkle–Damgård B.1. SHA-1 0x11148a173fd3e32c0fa78b90fe42d305f202244e2739 The fields for this multihash are - hashing function: sha1 (0x11), length: 20 (0x14), digest: 0x8a173fd3e32c0fa78b90fe42d305f202244e2739 B.2. SHA-256 0x122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8 The fields for this multihash are - hashing function: sha2-256 (0x12), length: 32 (0x20), digest: 0x41dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8 B.3. SHA-512/256 0x132052eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4 The fields for this multihash are - hashing function: sha2-512 (0x13), length: 32 (0x20), digest: 0x52eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4 B.4. SHA-512 0x134052eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4c2cbbafd365f96fb12b1d98a0334870c2ce90355da25e6a1108a6e17c4aaebb0 The fields for this multihash are - hashing function: sha2-512 (0x13), length: 64 (0x40), digest: 0x52eb4dd19f1ec522859e12d897061565 70f8fbab1824870bc6f8c7d235eef5f4c2cbbafd365f96fb12b1d98a0334870c2ce90 355da25e6a1108a6e17c4aaebb0 B.5. blake2b512 0xb24040d91ae0cb0e48022053ab0f8f0dc78d28593d0f1c13ae39c9b169c136a779f21a0496337b6f776a73c1742805c1cc15e792ddb3c92ee1fe300389456ef3dc97e2 The fields for this multihash are - hashing function: blake2b-512 (0xb240), length: 64 (0x40), digest: 0xd91ae0cb0e48022053ab0f8f0dc78d 28593d0f1c13ae39c9b169c136a779f21a0496337b6f776a73c1742805c1cc15e792d db3c92ee1fe300389456ef3dc97e2 B.6. blake2b256 0xb220207d0a1371550f3306532ff44520b649f8be05b72674e46fc24468ff74323ab030 The fields for this multihash are - hashing function: blake2b-256 (0xb220), length: 32 (0x20), digest: 0x7d0a1371550f3306532ff44520b649f8be05b72674e46fc24468ff74323ab030 B.7. blake2s256 0xb26020a96953281f3fd944a3206219fad61a40b992611b7580f1fa091935db3f7ca13d The fields for this multihash are - hashing function: blake2s-256 (0xb260), length: 32 (0x20), digest: 0xa96953281f3fd944a3206219fad61a40b992611b7580f1fa091935db3f7ca13d B.8. blake2s128 0xb250100a4ec6f1629e49262d7093e2f82a3278 The fields for this multihash are - hashing function: blake2s-128 (0xb250), length: 16 (0x10), digest: 0x0a4ec6f1629e49262d7093e2f82a3278 Appendix C. IANA Considerations TODO - format current Contributing.md document language (https://github.com/multiformats/multiformats/blob/master/ contributing.md#multiformats-registries) to align better with [RFC8126] C.1. Initial Values for the Multihash Identifier Registry The Multihash Identifier Registry contains hash functions supported by Multihash each with its canonical name, its value in hexadecimal notation, and its status. The following initial entries should be added to the registry to be created and maintained at (the suggested URI): http://www.iana.org/assignments/multihash-identifiers +========================+==========+======+==================================+ |Name |Identifier|Status|Specification | +========================+==========+======+==================================+ |identity |0x00 |active|n/a | +------------------------+----------+------+----------------------------------+ |sha1 |0x11 |active|[RFC6234] | +------------------------+----------+------+----------------------------------+ |sha2-256 |0x12 |active|[FIPS202] | +------------------------+----------+------+----------------------------------+ |sha2-512 |0x13 |active|[FIPS202] | +------------------------+----------+------+----------------------------------+ |sha3-512 |0x14 |active|[FIPS202] | +------------------------+----------+------+----------------------------------+ |sha3-384 |0x15 |active|[FIPS202] | +------------------------+----------+------+----------------------------------+ |sha3-256 |0x16 |active|[FIPS202] | +------------------------+----------+------+----------------------------------+ |sha3-224 |0x17 |active|[FIPS202] | +------------------------+----------+------+----------------------------------+ |blake3 |0x1e |draft |draft-aumasson-blake3 (internet- | | | | |draft) | | | | |(https://datatracker.ietf.org/doc/| | | | |draft-aumasson-blake3/) | +------------------------+----------+------+----------------------------------+ |sha3-384 |0x20 |active|[FIPS202] | +------------------------+----------+------+----------------------------------+ |sha2-256-trunc264-padded|0x1012 |active|[RFC6234] | +------------------------+----------+------+----------------------------------+ |sha2-224 |0x1013 |active|[RFC6234] | +------------------------+----------+------+----------------------------------+ |sha2-512-224 |0x1014 |active|[RFC6234] | +------------------------+----------+------+----------------------------------+ |sha2-512-256 |0x1015 |active|[RFC6234] | +------------------------+----------+------+----------------------------------+ |k12 |0x1d01 |draft |draft-irtf-cfrg-kangarootwelve-06 | | | | |(https://datatracker.ietf.org/doc/| | | | |draft-irtf-cfrg- | | | | |kangarootwelve/06/) | +------------------------+----------+------+----------------------------------+ |blake2b-256 |0xb220 |active|[RFC7693] | +------------------------+----------+------+----------------------------------+ |blake2b-512 |0xb240 |active|[RFC7693] | +------------------------+----------+------+----------------------------------+ |blake2s-256 |0xb260 |active|[RFC7693] | +------------------------+----------+------+----------------------------------+ Table 2 NOTE: There are many draft and experimental registrations in the historical community registry, which is maintained by the IPFS Foundation on github (https://github.com/multiformats/multicodec/blob/master/table.csv). C.2. The 'mh' Digest Algorithm This memo registers the "mh" digest-algorithm in the HTTP Digest Algorithm Values (https://www.iana.org/assignments/http-dig-alg/http- dig-alg.xhtml) registry with the following values: Digest Algorithm: mh Description: The multibase-serialized value of a multihash-supported algorithm. References: this document Status: standard C.3. The 'mh' Named Information Hash Algorithm This memo registers the "mh" hash algorithm in the Named Information Hash Algorithm (https://www.iana.org/assignments/named-information/ named-information.xhtml#hash-alg) registry with the following values: ID: 49 Hash Name String: mh Value Length: variable Reference: this document Status: current Acknowledgments Thanks to Carsten Borman, Benjamin Goering, Aaron Goldman, Dirk Kutscher, and others for their substantial contributions to this document on the multiformats mailing list. Authors' Addresses Juan Benet Protocol Labs Email: juan@protocol.ai URI: http://juan.benet.ai/ Manu Sporny Digital Bazaar Phone: +1 540 961 4469 Email: msporny@digitalbazaar.com URI: http://manu.sporny.org/ Juan Caballero Interplanetary File System Foundation Email: bumblefudge@ipfs.tech URI: https://ipfs.tech/