Contents
USB Background
USB History
Universal Serial Bus (USB) is a standard interface for connecting peripheral devices to a host computer. The USB system was originally devised by a group of companies including Compaq, Digital Equipment, IBM, Intel, Microsoft, and Northern Telecom to replace the existing mixed connector system with a simpler architecture.
USB was designed to replace the multitude of cables and connectors required to connect peripheral devices to a host computer. The main goal of USB was to make the addition of peripheral devices quick and easy. All USB devices share some key characteristics to make this possible. All USB devices are self-identifying on the bus. All devices are hot-pluggable to allow for true Plug’n’Play capability. Additionally, some devices can draw power from the USB which eliminates the need for extra power adapters.
To ensure maximum interoperability the USB standard defines all aspects of the USB system from the physical layer (mechanical and electrical) all the way up to the software layer. The USB standard is maintained and enforced by the USB Implementers Forum (USB-IF). USB devices must pass a USB-IF compliance test in order to be considered in compliance and to be able to use the USB logo.
USB 1.0 was first introduced in 1996, but was not adopted widely until 1998 with USB 1.1. In 2000, USB 2.0 was released and has since become the de facto standard for connecting devices to computers and beyond. In 2008, the USB specification was expanded with USB 3.0, also known as SuperSpeed USB. USB 3.0 represents a significant change in the underlying operation of USB. To simplify the experience for the user, USB 3.0 has been designed to be plug-n-play backwards compatible with USB 2.0.
USB 3.0 specification include a number of significant changes including:
- Higher data transfer rate (up to 5 Gbps)
- Increased bus power and current draw
- Improved power management
- Full duplex data communications
- Link Training and Status State Machine (LTSSM)
- Interrupt driven, instead of polling
- Streaming interface for more efficient data transfers
As of 2010, the USB standard specifies different flavors of USB: low-speed, full-speed, high-speed, and SuperSpeed. USB-IF has also released additional specs that expand the breadth of USB. These are On-The-Go (OTG) and Wireless USB. Although beyond the scope of this document, details on these specs can be found on the USB-IF website.
Architectural Overview
USB is a host-scheduled, token-based serial bus protocol. USB allows for the connection of up to 127 devices on a single USB host controller. A host PC can have multiple host controllers which increases the maximum number of USB devices that can be connected to a single computer.
Devices can be connected and disconnected at will. The host PC is responsible for installing and uninstalling drivers for the USB devices on an as-needed basis.
A single USB system comprises of a USB host and one or more USB devices. There can also be zero or more USB hubs in the system. A USB hub is special class of device. The hub allows the connection of multiple downstream devices to an upstream host or hub. In this way, the number of devices that can be physically connected to a computer can be increased.
A USB device is a peripheral device that connects to the host PC. The range of functionality of USB devices is ever increasing. The device can support either one function or many functions. For example a single multi-function printer may present several devices to the host when it is connected via USB. It can present a printer device, a scanner device, a fax device, etc.
All the devices on a single USB must share the bandwidth that is available on the bus. It is possible for a host PC to have multiple buses which would all have their own separate bandwidth. Most often, the ports on most motherboards are paired, such that each bus has two downstream ports.
Figure 1: Sample USB Bus Topology.
A USB can only have a single USB host device. This host can support up to 127 different devices on a single port. There is an upper limit of 7 tiers of devices which means that a maximum of 5 hubs can be connected inline.
The USB has a tiered star topology (Figure 1). At the root tier is the USB host. All devices connect to the host either directly or via a hub. According to the USB spec, a USB host can only support a maximum of seven tiers.
USB 2.0 Specific Architecture
Figure 2: USB Broadcast
A USB 2.0 host broadcasts information to all the devices below it. Low-speed and high-speed enabled devices will only see traffic at their respective speeds. Full-speed devices can see both their speed and low-speed traffic.
USB 2.0 works through a unidirectional broadcast system. When a host sends a packet, all downstream devices will see that traffic. If the host wishes to communicate with a specific device, it must include the address of the device in the token packet. Upstream traffic (the response from devices) are only seen by the host or hubs that are directly on the return path to the host.
There are, however, a few caveats when dealing with devices that are of different speeds. Low-speed and high-speed devices are isolated from traffic at speeds other then their own. They will only see traffic that is at their respective speeds. Referring to Figure 2, this means that downstream traffic to device H1 will be seen by device H2 (and vice versa). Also, downstream traffic to device L1 will be seen by L2 (and vice versa). However, full-speed devices can see traffic at its own speed, as well as low-speed traffic, using a special signaling mode dubbed low-speed-over-full-speed. This means that downstream traffic to F1 will be seen by F2 (and vice versa) with standard full-speed signaling, and downstream traffic to either L1 or L2 will also be seen by both F1 and F2 through the special low-speed-over-full-speed signaling.
USB 3.0 Specific Architecture
USB 3.0 marks a significant change from the existing USB infrastructure and affects the protocol at nearly all levels. The major features of USB 3.0 will be covered briefly in this article. For detailed information please consult the USB specifications from the USB-IF.
USB 3.0 Physical Interface
Due to limitations of the differential signaling of USB 2.0, in order to be able to support 5 Gbps data communications, the physical interface has been upgraded. In addition to the normal USB 2.0 signals, USB 3.0 cables and connectors have two additional pairs of differential signals: one pair for transmit and one pair for receive, as seen in Figure 3.
These two additional pairs allow for full-duplex communication over USB 3.0. Since the original USB 2.0 lines are unchanged, USB 2.0 communications can occur in parallel to USB 3.0.
USB 3.0 Power
Many of the key changes in USB 3.0 involve power and power management of USB devices.
USB 3.0 Power Distribution
The amount of power available to USB devices has been increased in USB 3.0. For unconfigured devices, 150 mA of power is available, compared with only 100 mA of power in USB 2.0. 150 mA is considered one unit load. Configured devices are able to draw up to 6 unit loads, or 900 mA, a significant increase from the 500 mA available in USB 2.0. The added power allows for a broader range of devices to be bus-powered.
USB 3.0 Power Management
USB 3.0 provides better power management facilities to use power more efficiently, and to help reduce overall power consumption.
Link-level power management allows the host or device to initiate a transition to a lower-level power state. There are three low power states available that are described in Section 4.
In USB 3.0, there is no longer periodic device polling and packets are no longer broadcast on the bus. It is now possible for devices to enter low power states when idle in USB 3.0 because they no longer have to manage the reception of these packets.
Low-power levels are configurable on the device level and the function level. A device can suspend all or some of this functionality when it is idle, therefore reducing its power consumption.
With Latency Tolerance Messaging, devices can report their latency tolerance to the host, allowing the host system to enter lower power states without negatively affecting the USB devices on the bus.
USB 3.0 Physical Layer
In USB 3.0, the physical layer specifies the electrical characteristics of SuperSpeed USB signals – how information is scrambled and encoded, and special signal sequences used by other layers.
Here is a brief overview of some of the new technologies specific to SuperSpeed USB.
Receiver Termination
USB 3.0 receivers terminate the transmission line by placing a small resistor to ground. Transmitters will check for this termination resistor on the receiver as a way for detecting the presence of a USB 3.0 receiver.
Data Scrambling
The physical layer uses bit scrambling to reduce electrical interference problems on the lines. However, it is possible for a transmitter to disable this feature.
8b/10b encoding
8b/10b encoding maps 8-bit symbols to 10-bit symbols with the purpose of keeping a low disparity while continuing to have enough edge transitions for clock recovery.
Disparity is kept low by taking advantage of the increased number space that 10b has to offer. Since all the 8b values would only take a subset of the 10b number space, multiple 10b symbols can be used encode a single 8b value. Often times, two different 10b symbols will be used to encode an 8b value, where the two 10b symbols have different number of 1s and 0s. The 10b symbol that is chosen to be sent will minimize the existing disparity on the line, with the goal of having a net 50/50 distribution of 1s and 0s. For example, if a line has a running disparity of +2 1s, the next symbol on the line will have a bit pattern that has more 0s.
In addition, the increased number space allows for the use of certain control symbols, called K symbols which do not map to any 8b data value. USB 3.0 uses these control symbols for a number of purposes including: packet framing, elastic buffer mitigation, and data scrambler control.
Training Sequences
To accomodate the various signaling characteristics of all manufactured transmitters, cables, and connectors, SuperSpeed receivers must be trained upon connection to a transmitter. This training sequence helps configure the receiver equalization, polairty, and data scrambler in order to establish a successful communications link.
Spread Spectrum Clocking
SuperSpeed USB employs spread spectrum clocking on its signaling. The advantage of this is that rather than radiating all energy in a small frequency band at a high level, a spread spectrum clock spreads its energy in a slightly larger frequency band, which reduces the peak level at any specific frequency. This is done to help meet EMC regulations.
Low-Frequency Periodic Signaling (LFPS)
LFPS signal is a side-band of communication sent on the normal SuperSpeed data lines at a lower frequency (10-50 MHz instead of 5 Gbps). This side-band helps to manage signal initiation and low power management on the bus on a link between two ports.
Elastic Buffer
USB 3.0 devices do not share the exact same clock source. Therefore they must be able to tolerate small variations between reference clocks on the transmitter and receiver. To compensate for such differences, receivers implement elasticity buffers that add or throw away “dummy” data, called SKP ordered sets, based on the state of the buffer at the time that the SKP ordered sets were received.
USB 3.0 Link Layer
The Link Layer is responsible for establishing and maintaining a reliable channel between a host and a device. There are a number of key concepts in this layer:
Link Commands
Link Commands are used to ensure the successful transfer of a packet, link flow control, and link power management.
Link Training and Status State Machine (LTSSM)
LTSSM is a state machine that defines link connectivity and link power management. LTSSM consists of 12 states: 4 operational link states (U0, U1, U2, U3), 4 link initialization and training states (Rx.Detect, Polling, Recovery, Hot Reset), 2 link test states (Loopback and Compliance Mode), SS.Inactive (which is a link error state where USB 3.0 is non-operable) and SS.Disabled (where the SuperSpeed bus is disabled and operates as USB 2.0 only). Figure 4 maps out all the states of LTSSM and defines how the link transitions between states.
Figure 4: LTSSM State Machine
The Link Training and Status State Machine (LTSSM) is the core of the USB 3.0 link layer and defines link connectivity and link power management states states and transitions. Image courtesy of USB Implementers Forum
In order for a USB 3.0 device to enter the U0 operational link state, the link must be trained in order to synchronize the transmitter and receiver between the host and device.
Key LTSSM link states:
Rx.Detect
This is the initial power-on state where a transmitter checks for proper receiver termination to determine if its SuperSpeed partner is present on the bus. When the termination is detected, link training can begin.
Polling
During the polling state, two link partners train the link to synchronize their communications in preparation for data transmission.
U0
This is the normal operational state where SuperSpeed signaling is enabled and 5Gb packets are transmitted and received.
U1, U2, U3
These are low-power states where no 5Gb packets are transmitted. U1, U2, and U3 have increasingly longer wakeup times into U0, and thus allow transmitters to go into increasingly deeper sleeps.
USB 3.0 Protocol Layer
The USB 3.0 protocol layer manages the flow of data between devices, and specifies how the different packet structures are used.
Theory of Operations
This introduction is a general summary of the USB spec. Total Phase strongly recommends that developers consult the USB specification on the USB-IF website for detailed and up-to-date information.
USB 2.0 Connectors
Figure 5: USB Cable
A USB cable has two different types of connectors: "A" and "B". "A" connectors connect upstream towards the Host and "B" connectors connect downstream to the Devices.
USB cables have two different types of connectors: "A" and "B". "A" type connectors connect towards the host or upstream direction. "B" connectors connect to downstream devices, though many devices have captive cables eliminating the need for "B" connectors. The "A" and "B" connectors are defined in the USB spec to prevent loopbacks in the bus. This prevents a host from being connected to a host, or conversely a device to a device. It also helps enforce the tiered star topology of the bus. USB hubs have one "B" port and multiple "A" ports which makes it clear which port connects to the host and which to downstream devices.
The USB spec has been expanded to include Mini-A and Mini-B connectors to support small USB devices. The USB On-The-Go (OTG) spec has introduced the Micro-A plug, Micro-B plug and receptacle, and the Micro-AB receptacle to allow for device-to-device connections. (The previous Mini-A plug and Mini-AB receptacle have now been deprecated.)
USB 3.0 Connectors
The new USB 3.0 connectors serve two purposes. First, the connectors must be capable of physically interfacing with USB 3.0 signals to provide the ability to send and receive SuperSpeed USB data. Secondly, the connectors must be backwards compatible with USB 2.0 cables.
Figure 6: USB 3.0 Standard-A Connector
USB 3.0 Standard-A plug and receptacle. Image courtesy of USB Implementers Forum
The USB 3.0 Standard-A connector (Figure 6) is very similar in appearance to the USB 2.0 Standard-A connector. However, the USB 3.0 Standard-A connector and receptacle have 5 additional pins: a differential pair for transmitting data, a differential pair for receiving data, and the drain. USB 3.0 Standard-A plugs and receptacles are often colored blue to help differentiate it from USB 2.0.
The USB 3.0 Standard-A connector has been designed to be able to be plugged into either a USB 2.0 or USB 3.0 receptacle. Similarly, the USB 3.0 Standard-A receptacle is designed to accept both the USB 3.0 and the USB 2.0 Standard-A plugs.
Figure 7: USB 3.0 Standard-B Connector
USB 3.0 Standard-B plug and receptacle. Image courtesy of USB Implementers Forum
The USB 3.0 Standard-B connector (Figure 7) is similar to the USB 2.0 Standard-B connector, with an additional structure at the top of the plug for the additional USB 3.0 pins. Due to the distinct appearance of the USB 3.0 Standard-B plug and receptacle, they do not need to be color coded, however many manufacturers color them blue to match the Standard-A connectors.
Given the new geometry, the USB 3.0 Standard-B plug is only compatible with USB 3.0 Standard-B receptacles. Conversely, the USB 3.0 Standard-B receptacle can accept either a USB 2.0 or USB 3.0 Standard-B plug.
Figure 8: USB 3.0 Powered-B Connector
USB 3.0 includes a variant of the Standard-B connectors which has two additional conductors to provide power to USB adapters. Image courtesy of USB Implementers Forum
A Powered-B variant of the Standard-B connector (Figure 8) is also defined by the USB 3.0 specification. The Powered-B connector has two additional pins to provide power to a USB adapter without the need for an external power supply.
Figure 9: USB 3.0 Micro-A Connector
USB 3.0 Micro-A plug and receptacle. Image courtesy of USB Implementers Forum
Figure 10: USB 3.0 Micro-B Connector
USB 3.0 Micro-B plug and receptacle. Image courtesy of USB Implementers Forum
USB 3.0 also specifies Micro-A (Figure 9) and Micro-B (Figure 10) connectors. Given the small size of the original USB 2.0 micro connectors, it was not possible to add the USB 3.0 signals in the same form factor. The USB 3.0 micro plugs cannot interface with USB 2.0 receptacles, but USB 2.0 micro plugs can interface with USB 3.0 receptacles.
USB 2.0 Signaling
All USB devices are connected by a four wire USB cable. These four lines are VBUS, GND and the twisted pair: D+ and D-. USB uses differential signaling on the two data lines. There are four possible digital line states that the bus can be in: single-ended zero (SE0), single-ended one (SE1), J, and K. The single-ended line states are defined the same regardless of the speed. However, the definitions of the J and K line states change depending on the bus speed. Their definitions are described in Table 1. All data is transmitted through the J and K line states. An SE1 condition should never be seen on the bus, except for allowances during transitions between the other line states.
Table 1: Differential Signal Encodings
D+ |
D- |
|
Single-ended zero (SE0) |
0 |
0 |
Single-ended one (SE1) |
1 |
1 |
Low-speed J |
0 |
1 |
Low-speed K |
1 |
0 |
High-/Full-speed J |
1 |
0 |
High-/Full-speed K |
0 |
1 |
The actual data on the bus is encoded through the line states by a nonreturn-to-zero-inverted (NRZI) digital signal. In NRZI encoding, a digital 1 is represented by no change in the line state and a digital 0 is represented as a change of the line state. Thus, every time a 0 is transmitted the line state will change from J to K, or vice versa. However, if a 1 is being sent the line state will remain the same.
USB has no synchronizing clock line between the host and device. However, the receiver can resynchronize whenever a valid transition is seen on the bus. This is possible provided that a transition in the line state is guaranteed within a fixed period of time determined by the allowable clock skew between the receiver and transmitter. To ensure that a transition is seen on the bus within the required time, USB employs bit stuffing. After 6 consecutive 1s in a data stream (i.e. no transitions on the D+ and D- lines for 6 clock periods), a 0 is inserted to force a transition of the line states. This is performed regardless of whether the next bit would have induced a transition or not. The receiver, expecting the bit stuff, automatically removes the 0 from the data stream.
USB 3.0 Signaling
USB 3.0 signaling occurs on two dedicated pairs of differential pairs for transmission and reception. Due to the full-duplex nature of the USB 3.0 bus, the bus operates differently from a USB 2.0 bus.
USB 3.0 continues to use the concept of endpoints, pipes, and the four basic types of transfers: control, interrupt, bulk, and isochronous. USB 3.0 still uses three-part transaction of Token, Data, and Handshake, but the components are used differently. In the case of OUTs, the token is now incorporated in the data packet. In the case of INs, the token is replaced by a handshake.
There are also a number of significant changes in the USB 3.0 protocol layer to improve the efficiency of data transfers.
Unicast Communications
Packets are no longer broadcast on the USB bus, to allow for lower power states. In USB 2.0, packets are broadcast, consequently every device must decode the packet to determine if it needs to respond. In USB 3.0, packets are unicast, meaning that packets are sent on a directed path between the host and device as specified by routing information in the packet.
There is one exception: Isochronous Timestamp Packets (ITP) are broadcast on the bus, and provide timing information to all devices in lieu of Start of Frame packets.
Asynchronous Notifications
Device polling has been eliminated in USB 3.0 to reduce bus overhead and allow for lower power states. When data is requested from a device and it is not able to respond, it can send a Not Ready packet (NRDY). When the device has freed its resources and can service the data request, it issues an Endpoint Ready packet (ERDY) informing the host that it can send another request for data.
Data Streaming
To improve data transfer performance, USB 3.0 introduces streams for bulk transfer endpoints. Streams are a protocol-supported method of multiplexing multiple data streams through a standard bulk pipe.
Bus Speed
The bus speed determines the rate at which bits are sent across the bus. There are currently four speeds at which wired USB operates: low-speed (1.5 Mbps), full-speed (12 Mbps), high-speed (480 Mbps), and SuperSpeed (5 Gbps). In order to determine the bus speed of a full-speed or low-speed device, the host must simply look at the idle state of the bus. Full-speed devices have a pull-up resistor on the D+ line, whereas low-speed devices have a pull-up resistor on the D- line. Therefore, if the D+ line is high when idle, then full-speed connectivity is established. If the D- line is high when idle, then low-speed connectivity is in effect. A full-speed device does not have to be capable of running at low-speed, and vice versa. A full-speed host or hub, however, must be capable of communicating with both full-speed and low-speed devices.
With the introduction of high-speed USB, high-speed hosts and hubs must be able to communicate with devices of all speeds. Additionally, high-speed devices must be backward compatible for communication at full-speed with legacy hosts and hubs. To facilitate this, all high-speed hosts and devices initially operate at full-speed and a high-speed handshake must take place before a high-speed capable device and a high-speed capable host can begin operating at high-speed. The handshake begins when a high-speed capable host sees a full-speed device attached. Because high-speed devices must initially operate at full-speed when first connected, they must pull the D+ line high to identify as a full-speed device. The host will then issue a reset on the bus and wait to see if the device responds with a Chirp K, which identifies the device as being high-speed capable. If the host does not receive a Chirp K, it quits the high-speed handshake sequence and continues with normal full-speed operation. However, if the host receives a Chirp K, it responds to the device with alternating pairs of Chirp K’s and Chirp J’s to tell the device that the host is high-speed capable. Upon recognizing these alternating pairs, the device switches to high-speed operation and disconnects its pull-up resistor on the D+ line. The high-speed connection is now established and both the host and the device begin communicating at high-speed. See the USB specification for more details on the high-speed handshake.
To accommodate high-speed data-rates and avoid transceiver confusion, the signaling levels of high-speed communication is much lower than that of full and low-speed devices. Full and low-speed devices operate with a logical high level of 3.3 V on the D+ and D- lines. For high-speed operation, signaling levels on the D+ and D- lines are reduced to 400 mV. Because the high-speed signaling levels are so low, full and low-speed transceivers are not capable of seeing high-speed traffic.
To accommodate the high-speed signaling levels and speeds, both hosts and devices use termination resistors. In addition, during the high-speed handshake, the device must release its full-speed pull-up resistor. But during the high-speed handshake, often times the host will activate its termination resistors before the device releases its full-speed pull-up resistor. In these situations the host may not be able to pull the D+ line below the threshold level of its high-speed receivers. This may cause the host to see a spurious Chirp J (dubbed a Tiny J) on the lines. This is an artifact on the bus due to the voltage divider effect between the device’s 1.5 Kohm pull-up resistor and the host’s 45 ohm termination resistor. Hosts and devices must be robust against this situation. Once the device has switched to high-speed operation the Tiny J will no longer be present, since the device will have released its pull-up resistor.
With USB 3.0, a separate SuperSpeed USB channel co-exists in parallel with the normal USB 2.0 bus. It is important to point out that SuperSpeed USB is a full-duplex bus, thus both the host and the device act as a transmitter and receiver. In order to communicate over USB 3.0, each transmitter must detect the termination on the receiver side. If the termination is not detected, the host will downgrade its communications to USB 2.0. If the termination is detected, link training begins so that the receiver can synchronize with the transmitter. Once the link is established, the link enters U0 and data communications can begin.
Endpoints and Pipes
The endpoint is the fundamental unit of communication in USB. All data is transferred through virtual pipes between the host and these endpoints. All communication between a USB host and a USB device is addressed to a specific endpoint on the device. Each device endpoint is a unidirectional receiver or transmitter of data; either specified as a sender or receiver of data from the host.
A pipe represents a data pathway between the host and the device. A pipe may be unidirectional (consisting of only one endpoint) or bidirectional (consisting of two endpoints in opposite directions).
A special pipe is the Default Control Pipe. It consists of both the input and output endpoints 0. It is required on all devices and must be available immediately after the device is powered. The host uses this pipe to identify the device and its endpoints and to configure the device.
Endpoints are not all the same. Endpoints specify their bandwidth requirements and the way that they transfer data. There are four transfer types for endpoints:
Control
Non-periodic transfers. Typically, used for device configuration, commands, and status operation.
Interrupt
This is a transaction that is guaranteed to occur within a certain time interval. The device will specify the time interval at which the host should check the device to see if there is new data. This is used by input devices such as mice and keyboards.
Isochronous
Periodic and continuous transfer for time-sensitive data. There is no error checking or retransmission of the data sent in these packets. This is used for devices that need to reserve bandwidth and have a high tolerance to errors. Examples include multimedia devices for audio and video.
Bulk
General transfer scheme for large amounts of data. This is for contexts where it is more important that the data is transmitted without errors than for the data to arrive in a timely manner. Bulk transfers have the lowest priority. If the bus is busy with other transfers, this transaction may be delayed. The data is guaranteed to arrive without error. If an error is detected in the CRCs, the data will be retransmitted. Examples of this type of transfer are files from a mass storage device or the output from a scanner.
USB 2.0 Packets
All USB packets are prefaced by a SYNC field and then a Packet Identifier (PID) byte. Packets are terminated with an End-of-Packet (EOP).
The SYNC field, which is a sequence of KJ pairs followed by 2 K’s on the data lines, serves as a Start of Packet (SOP) marker and is used to synchronize the device’s transceiver with that of the host. This SYNC field is 8 bits long for full/low-speed and 32 bits long for high speed.
The EOP field varies depending on the bus speed. For low- or full-speed buses, the EOP consists of an SE0 for two bit times. For high-speed buses, because the bus is at SE0 when it is idle, a different method is used to indicate the end of the packet. For high-speed, the transmitter induces a bit stuff error to indicate the end of the packet. So if the line state before the EOP is J, the transmitter will send 8-bits of K. The exception to this is the high-speed SOF EOP, in which case the high-speed EOP is extended to 40-bits long. This is done for bus disconnect detection.
The PID is the first byte of valid data sent across the bus, and it encodes the packet type. The PID may be followed by anywhere from 0 to 1026 bytes, depending on the packet type. The PID byte is self-checking; in order for the PID to be valid, the last 4 bits must be a one’s complement of the first 4 bits. If a received PID fails its check, the remainder of the packet will be ignored by the USB device.
There are four types of PID which are described in Table 2.
PID Type |
PID Name |
Description |
Token |
OUT |
Host to device transfer |
IN |
Device to Host transfer |
|
SOF |
Start of Frame marker |
|
SETUP |
Host to device control transfer |
|
Data |
DATA0 |
Data packet |
DATA1 |
Data packet |
|
DATA2 |
High-Speed Data packet |
|
MDATA |
Split/High-Speed Data packet |
|
Handshake |
ACK |
The data packet was received error free |
NAK |
Receiver cannot accept data or the transmitter could not send data |
|
STALL |
Endpoint halted or control pipe request is not supported |
|
NYET |
No response yet |
|
Special |
PRE |
Preamble to full-speed hub for low-speed traffic |
ERR |
Error handshake for Split Transaction |
|
SPLIT |
Preamble to high-speed hub for low/full-speed traffic |
|
PING |
High-speed flow control token |
|
EXT |
Protocol extension token |
The format of the IN, OUT, and SETUP Token packets is shown in Figure 11. The format of the SOF packet is shown in Figure 12. The format of the Data packets is shown in Figure 13. Lastly, the format of the Handshake packets is shown in Figure 14.
Data Transactions
Data transactions occur in three phases: Token, Data, and Handshake.
All communication on the USB is host-directed. In the Token phase, the host will generate a Token packet which will address a specific device/endpoint combination. A Token packet can be IN, OUT, or SETUP.
IN |
The host is requesting data from the addressed dev/ep. |
OUT |
The host is sending data to the addressed dev/ep. |
SETUP |
The host is transmitting control information to the device. |
In the data phase, the transmitter will send one data packet. For IN requests, the device may send a NAK or STALL packet during the data phase to indicate that it isn’t able to service the token that it received.
Finally, in the Handshake phase the receiver can send an ACK, NAK, or STALL indicating the success or failure of the transaction.
All of the transfers described above follow this general scheme with the exception of the Isochronous transfer. In this case, no Handshake phase occurs because it is more important to stream data out in a timely fashion. It is acceptable to drop packets occasionally and there is no need to waste time by attempting to retransmit those particular packets.
Control Transfers
Control transfers are a group of transactions that occur on the control pipe. The control pipe is the only type of pipe which is allowed to use SETUP transactions. A control transfer consists of at least two stages called the Setup Stage and the Status Stage. Optionally, control transfers may also include a Data Stage.
The Setup Stage always consists of a single SETUP transaction. This transaction contains 8 bytes of data of which some of the bytes specify the length of the control transfer and its direction. The direction may either be host-to-device or device-to-host. If the length is not zero, then the control transfer will have a Data Stage. The Data Stage is always comprised of either IN transactions or OUT transactions depending on the direction of the control transfer. The Data Stage will never be made a mix of the two. Lastly, the Status Stage consists of an IN transaction if the control transfer was a host-to-device, or a OUT transfer was a device-to-host. The Status Stage may end in an ACK if the function completed successfully, or STALL if the function had an error. It is also possible to see a transaction STALL in the Data Stage if the device is unable to send or receive the requested data.
Polling Transactions
It is possible that when a host requests data or sends data that the device will not be able to service the request. This could occur if the device has no new information to provide the host or is perhaps too busy to send/receive any data. In these situations the device will NAK the host. If the data transfer is a Control or Bulk transfer, the host will retry the transaction. However, if it is Isochronous or Interrupt transfer, it will not retry the transaction.
On a full or low-speed bus, if the transaction is repeated, it is repeated in its entirety. This is true regardless of the direction of the data transfer. If the host is requesting information, it will continue to send IN tokens until the device sends data. Until then, the device responds with a NAK, leading to the multitude of IN+NAK pairs that are commonly encountered on a bus. This does not have much consequence as an IN token is only 3 bytes and the NAK is only 1 byte. However, if the host is transmitting data there is the potential for graver consequences. For example, if the host attempted to send 64 bytes of data to a device, but the device responded with a NAK, the host will retry the entire data transaction. This requires sending the entire 64-byte data payload repeatedly until the device responds with an ACK. This has the potential to waste a significant amount of bandwidth. It is for this reason that high-speed hosts have an additional feature when the device signals the inability to accept any more data.
When a high-speed host receives a NAK after transmitting data, instead of retransmitting the entire transaction, it simply sends a 3 byte PING token to poll the device and endpoint in question. (Alternatively, if the device responds to the OUT+DATA with a NYET handshake, it means that the device accepted the data in the current transaction but is not ready to accept additional data, and the host should PING the device before transmitting more data.) The host will continue to PING the device until it responds with an ACK, which indicates to the host that it is ready to receive information. At that point, the host will transmit a packet in its entirety.
Hub Transactions
Hubs make it possible to expand the number of possible devices that can be attached to a single host. There are two types of hubs that are commercially available for wired USB: full-speed hubs and high-speed hubs. Both types of hubs have mechanisms for dealing with downstream devices that are not of their speed.
Full-speed hubs can, at most, transmit at 12 Mbps. This means that all high-speed devices that are plugged into a full-speed hub are automatically downgraded to full-speed data rates. On the other hand, low-speed devices are not upgraded to full-speed data rates. In order to send data to low-speed devices, the hub must actually pass slower moving data signals to those devices. The host (or high-speed hub) is the one that generates these slower moving signals on the full-speed bus. Ordinarily the low-speed ports on the hub are quiet. When a low-speed packet needs to be sent downstream, it is prefaced with a PRE PID. This opens up the low-speed ports. Note that the PRE is sent at full-speed data rates, but the following transaction is transmitted at low-speed data rates.
High-speed hubs only communicate at 480 Mbps with high-speed host. They do not downgrade the link between the host and hub to slower speeds. However, high-speed hubs must still deal with slower devices being downstream of them. High-speed hubs do not use the same mechanism as full-speed hubs. There would be a tremendous cost on bandwidth to other high-speed devices on the bus if low-speed or full-speed signaling rates were used between the host and the hub of interest. Thus, in order to save bandwidth, high-speed hosts do not send the PRE token to high-speed hubs, but rather a SPLIT token. The SPLIT token is similar to the PRE in that it indicates to a hub that the following transaction is for a slower speed device, however the data following the SPLIT is transmitted to the hub at high-speed data rates and does not choke the high-speed bus.
Figure 16: Split Bulk Transactions
When full/low-speed USB traffic is sent through a high-speed USB hub, the transactions are preceded by a SPLIT token to allow the hub to asynchronously handle the full/low-speed traffic without blocking other high-speed traffic from the host. In this example, bulk packets for a full-speed device are being sent through the high-speed hub. Multiple CSPLIT+IN+NYET transactions can occur on the bus until the high-speed hub is ready to return the DATA from the downstream full/low-speed device.
Although all SPLIT transactions have the same PID, there are two over-arching types of SPLITs: Start SPLITs (SSPLIT) and Complete SPLITs (CSPLIT). SSPLITs are only used the first time that the host wishes to send a given transaction to the device. Following that, it polls the hub for the device’s response with CSPLITs. The hub may respond many times with NYET before supplying the host with the device’s response. Once this transaction is complete, it will begin the next hub transaction with an SSPLIT. Figure 16 illustrates an example of hub transaction.
Start-of-Frame Transactions
Start-of-Frame (SOF) transactions are issued by the host at precisely timed intervals. These tokens do not cause any device to respond, and can be used by devices for timing reasons. The SOF provides two pieces of timing information. Because of the precisely timed intervals of SOFs, when a device detects an SOF it knows that the interval time has passed. All SOFs also include a frame number. This is an 11-bit value that is incremented on every new frame.
SOFs are also used to keep devices from going into suspend. Devices will go into suspend if they see an idle bus for an extended period of time. By providing SOFs, the host is issuing traffic on the bus and keeping devices from entering their suspended state.
Full-speed hosts will send 1 SOF every millisecond. High-speed hosts divide the frame into 8 microframes, and send an SOF at each microframe (i.e., every 125 microseconds). However, the high-speed hub will only increment the frame number after an entire frame has passed. Therefore, a high-speed host will repeat the same frame number 8 times before incrementing it.
Low-speed devices are never issued SOFs as it would require too much bandwidth on an already slower-speed bus. Instead, to keep low-speed devices from going into suspend, hosts will issue a keep-alive every millisecond. These keep-alives are short SE0 events on the bus that last for approximately 1.33 microseconds. They are not interpreted as valid data, and have no associated PID.
Extended Token Transactions
The new Link Power Management addendum to the USB 2.0 Specification has expanded the number of PIDs through the use of the previously reserved PID, 0xF0. The extended token format is a two phase transaction that begins with a standard token packet that has the EXT PID. Following this packet is the extended token packet, which takes a similar form. It begins with an 8-bit SubPID and ends with a 5-bit CRC, however the 11 remaining bits in the middle will have different meaning depending on the type of SubPID.
Figure 17: Extended Token Transaction
In an extended token transaction, the token phase of the transaction has two token packets. The first packet uses the EXT PID. The content of the second packet will depend on the particular SubPID specification. The subsequent Data and Handshake phases will depend on the value of the SubPID as well.
Following this token phase, the device will respond with the appropriate data or handshake, depending on the protocol associated with that SubPID. Currently, the only defined SubPID is for link power management (LPM). For more details, please refer to the Link Power Management addendum.
USB 3.0 Packets
USB 3.0 supports the same types of data transfers: control, interrupt, bulk, and isochronous. However the packet structure has changed to support the new features in USB 3.0.
USB 3.0 General Packet Structures
Packets in USB 3.0 generally come in 3 different patterns.
Header Packet
Header Packets consist of three parts: header packet framing, packet header, and a link control word. Note that "SHP" is a a K-symbol which stands for "start header packet." The header is protected by CRC-16, and the link control word is protected by CRC-5.
Data Payload Packet
Data Payload Packets send application data and are protected by CRC-32. Note that "SDP" stands for "start data packet payload."
Link Command Packet
Link Command Packets are used to control various link-specific features, including low power states and flow control. A Link Command Packet atually consists of two identical Link Command Words, where each Link Command Word is protected by a CRC-5. Note that "SLC" stands for "start link command."
Link Management Packets (LMP)
Link Management Packets (LMP) are a type of header packet used to manage the link between two ports (Figure 18). Because these packets are used to manage a single link, they only travel between the two link partners and therefore require no addressing information.
Figure 18: Link Management Packet (LMP)
Link Management Packets are used to manage the link between two link partners. Image courtesy of USB Implementers Forum.
The LMP commands to manage a link are listed below. Please consult the USB 3.0 specifications for more details.
- Set Link Function
- U2 Inactivity Timeout
- Vendor Device Test
- Port Capability
- Port Configuration
- Port Configuration Response
Transaction Packets (TP)
Transaction Packets (TP) are a type of header packet used to control the flow of data packets end-to-end between the host and device (Figure 19). Since these packets may traverse a number of links, each TP has a route string which is used by hubs to route the packet directly to the intended device.
Figure 19: Transaction Packet (TP)
Transaction Packets are used to control the flow of packets between the host and device. Image courtesy of USB Implementers Forum.
TPs do not contain application data and have a number of different subtypes:
- Acknowledgement (ACK)
- Not Ready (NRDY)
- Endpoint Ready (ERDY)
- STATUS
- STALL
- Device Notification (DEV_NOTIFICATION)
- PING
- PING_RESPONSE
Data Packets (DP)
Data Packets (DP) are used to transmit application data and are comprised of two parts: a data packet header (DPH) and a data packet payload (DPP) (Figure 20).
Figure 20: Data Packet (DP)
Data Packets are used to transmit application data between the host and device. A Data Packet is composed of a Data Packet Header (DPH) and the Data Packet Payload (DPP). Image courtesy of USB Implementers Forum.
Since data is being sent between the host and device, DP packets have a route string to direct it to intended device.
Isochronous Timestamp Packets (ITP)
Isochronous Timestamp Packets (ITP) are used to send timestamps to all devices for synchronization (Figure 21). ITPs are the only packets that are broadcast by the host to all active devices. Since this packet is broadcast, it does not require a route string.
Figure 21: Isochronous Timestamp Packet (LMP)
Link Management Packets send timestamps to active devices which is used for synchronization. Image courtesy of USB Implementers Forum.
Only hosts are allowed to send ITPs, and only when the host port is already in the U0 state. Devices are not required to respond to the ITP.
Enumeration and Descriptors
When a device is plugged into a host PC, the device undergoes Enumeration. This means that the host recognizes the presence of the device and assigns it a unique 7-bit device address. The host PC then queries the device for its descriptors, which contains information about the specific device. There are various types of descriptors as outlined below.
Figure 22: USB Descriptors
Hierarchy of descriptors of a USB device. A device has a single Device descriptor. The Device descriptor can have multiple Configuration descriptors, but only a single one can be active at a time. The Configuration descriptor can define one or more Interface descriptors. Each of the Interface descriptors can have one or more alternate settings, but only one setting can be active at a time. The Interface descriptor defines one or more Endpoints.
-
Device Descriptor: Each USB device can only have a single Device Descriptor. This descriptor contains information that applies globally to the device, such as serial number, vendor ID, product ID, etc. The device descriptor also has information about the device class. The host PC can use this information to help determine what driver to load for the device.
-
Configuration Descriptor: A device descriptor can have one or more configuration descriptors. Each of these descriptors defines how the device is powered (e.g. bus powered or self powered), the maximum power consumption, and what interfaces are available in this particular setup. The host can choose whether to read just the configuration descriptor or the entire hierarchy (configuration, interfaces, and alternate interfaces) at once.
-
Interface Descriptor: A configuration descriptor defines one or more interface descriptors. Each interface number can be subdivided into multiple alternate interfaces that help more finely modify the characteristics of a device. The host PC selects particular alternate interface depending on what functions it wishes to access. The interface also has class information which the host PC can use to determine what driver to use.
-
Endpoint Descriptor: An interface descriptor defines one or more endpoints. The endpoint descriptor is the last leaf in the configuration hierarchy and it defines the bandwidth requirements, transfer type, and transfer direction of an endpoint. For transfer direction, an endpoint is either a source (IN) or sink (OUT) of the USB device.
-
String Descriptor: Some of the configuration descriptors mentioned above can include a string descriptor index number. The host PC can then request the unicode encoded string for a specified index. This provides the host with human readable information about the device, including strings for manufacturer name, product name, and serial number.
Device Class
USB devices vary greatly in terms of function and communication requirements. Some devices are single-purpose, such as a mouse or keyboard. Other devices may have multiple functionalities that are accessible via USB such as a printer/scanner/fax device.
The USB-IF Device Working Group defines a discreet number of device classes. The idea was to simplify software development by specifying a minimum set of functionality and characteristics that is shared by a group of devices and interfaces. Devices of the same class can all use the same USB driver. This greatly simplifies the use of USB devices and saves the end-user the time and hassle of installing a driver for every single USB device that is connected to their host PC.
For example, input devices such as mice, keyboards and joysticks are all part of the HID (Human Interface Device) class. Another example is the Mass Storage class which covers removable hard drives and keychain flash disks. All of these devices use the same Mass Storage driver which simplifies their use.
However, a device does not necessarily need to belong to a specific device class. In these cases, the USB device will require its own USB driver that the host PC must load to make the functionality available to the host.
On-The-Go (OTG)
The OTG supplement to the USB 2.0 spec provides methods for mobile devices to communicate with each other, actively switch the role of host and device, and also request sessions from each other when power to the USB is removed.
The initial role of host and device is determined entirely by the USB connector itself. All OTG capable peripherals will have a 5-pin Micro-AB receptacle which can receive either the Micro-A or Micro-B plug. If the peripheral receives the Micro-A plug, then it behaves as the host. If it receives the Micro-B plug, then it behaves as the device. However, there may be certain situations where a peripheral received the Micro-B plug, but needs to behave as the host. Rather than request that the user swap the cable orientation, the two peripherals have the ability to swap the roles of host and device through the Host Negotiation Protocol (HNP).
The HNP begins when the A-device finishes using the bus and stops all bus activity. The B-device detects this and will release its pull-up resistor. When the A-device detects the SE0, it responds by activating its pull-up. Once the B-device detects this condition, the B-device issues reset and begins standard USB communication as the host.
In order to conserve power, A-devices are allowed to stop providing power to the USB. However, there could be situations where the B-device wants to use the bus and VBUS is turned off. It is for this reason that the OTG supplement describes a method for allowing the B-device to request a session from the A-device. Upon successful completion of the Session Request Protocol (SRP), the A-device will power the bus and continue standard USB transactions.
The SRP is broken up into two stages. From a disconnected state, the B-device must begin an SRP by driving one of its data lines high for a sufficient duration. This is called data-line pulsing. If the A-device does not respond to this, the B-device will drive the VBUS above a specified threshold and release it, thereby completing VBUS pulsing. If the A-device still does not begin a session, the B-device may start the SRP over again, provided the correct initial conditions are met.
For more details on OTG, please see the On-The-Go Supplement to the USB 2.0 Specification.