Raspberry Pi 3

From ScienceZero
Jump to: navigation, search
  • Quad Core 1.2GHz Broadcom BCM2837 64bit CPU
  • 1GB LPDDR2 RAM
  • BCM43438 WiFi 802.11 b/g/n (2.4GHz) and Bluetooth 4.1 LE
  • 40-pin extended GPIO
  • 4 × USB 2.0
  • Full size HDMI 1.4 socket with CEC and 3.5mm composite video jack
  • CSI camera port for connecting a Raspberry Pi camera
  • DSI display port for connecting a Raspberry Pi touchscreen display
  • 3.5 mm audio jack (Shared with composite video)
  • 1 × UART, 1 × SPI, 2 × I2C, PCM/I2S, 2 × PWM
  • Micro SD card slot
  • 10/100M Ethernet

BCM2837 System On Chip

CPU

4 × ARM Cortex A53

  • 8-stage pipelined processor with 2-way superscalar, in-order execution pipeline
Fetch | Decode1 | Decode2 | Decode3 | Operand | Execute1 | Execute2 | Retire
  • DSP and NEON SIMD extensions are mandatory per core
  • VFPv4 Floating Point Unit onboard (per core)
  • Hardware virtualization support
  • TrustZone security extensions
  • 64-byte cache lines
  • 16kB Level 1 instruction cache, 2-way set associative
  • 16kB Level 1 data cache, 4-way set associative
  • 512kB Level 2 cache, 512-entry TLB
  • 4 KiB conditional branch predictor, 256-entry indirect branch predictor

AArch32 features

  • Has 15 general-purpose 32-bit registers (R0-R14).
  • PC is mapped to R15

AArch64 features

  • New instruction set, A64
  • Has 31 general-purpose 64-bit registers. (32 bit, W0-W30) (64 bit, X0-X30)
  • Has dedicated zero (WZR, XZR) or stack pointer (SP) register (depending on instruction).
  • The program counter (PC) is no longer directly accessible as a register.
  • Instructions are still 32 bits long and mostly the same as A32 (with LDM/STM instructions and most conditional execution dropped).
  • Has paired loads/stores (in place of LDM/STM).
  • No predication for most instructions (except branches).
  • Most instructions can take 32-bit or 64-bit arguments.
  • Addresses assumed to be 64-bit.
  • Advanced SIMD (NEON) enhanced
  • Has 32 × 128-bit registers (up from 16), also accessible via VFPv4.
  • IEEE 754 compliant double-precision floating point.
  • AES encrypt/decrypt and SHA-1/SHA-2 hashing instructions also use these registers.

Links

VideoCore

Dual 400MHz VideoCore IV, 3D core at 300MHz. OpenGL ES 2.0, hardware-accelerated OpenVG, and 1080p30 H.264 high-profile decode. Capable of 1Gpixel/s, 1.5Gtexel/s or 24GFLOPs with texture filtering and DMA infrastructure.

Links

Peripherals

Mini UART

The UART itself has no throughput limitations and can run up to 32 Mega baud. The mini UART is intended to be used as a console. It needs to be enabled before it can be used. It is also recommended that the correct GPIO function mode is selected before enabling the mini UART.

The mini UART has the following features:

  • 7 or 8 bit operation.
  • 1 start and 1 stop bit.
  • No parities.
  • Break generation.
  • 8 symbols deep FIFOs for receive and transmit.
  • SW controlled RTS, SW readable CTS.
  • Auto flow control with programmable FIFO level.
  • 16550 like registers.
  • Baudrate derived from system clock.

It does NOT have the following capabilities:

  • DMA
  • Break detection
  • Framing errors detection
  • Parity bit
  • Receive Time-out interrupt
  • DCD, DSR, DTR or RI signals

The implemented UART is not a 16650 compatible UART However as far as possible the first 8 control and status registers are laid out like a 16550 UART. All 16550 register bits which are not supported can be written but will be ignored and read back as 0. All control bits for simple UART receive/transmit operations are available.

Mini UART implementation details

The UART1_CTS and UART1_RX inputs are synchronised and will take 2 system clock cycles before they are processed. The module does not check for any framing errors. After receiving a start bit and 8 (or 7) data bits the receiver waits for one half bit time and then starts scanning for the next start bit. The mini UART does not check if the stop bit is high or wait for the stop bit to appear. As a result of this a UART1_RX input line which is continuously low (a break condition or an error in connection or GPIO setup) causes the receiver to continuously receive 0x00 symbols. The mini UART uses 8-times oversampling. The Baudrate can be calculated from:

baudrate = system_clock_freq / (8 * (baudrate_reg + 1))

If the system clock is 250 MHz and the baud register is zero the baudrate is 31.25 Mega baud. (25 Mbits/sec or 3.125 Mbytes/sec). The lowest baudrate with a 250 MHz system clock is 476 Baud. When writing to the data register only the least significant 8 bits are taken. All other bits are ignored. When reading from the data register only the least significant 8 bits are valid. All other bits are zero.

Timers

Free running 64 bit timer

The default rate is 1 MHz, derived from a 19.2 MHz crystal

timer = (crystal * prescaler / (2^31))
prescaler = ((2^31) * timer) / crystal

TIMERBase = 0x40000000
mov     x20,#0x42AA0000
movk    x20,#0x0000AAAB         ; ((2 147 483 648) * 10 000 000) /  19 200 000 = 1118481066.666... = 0x42AAAAAB
str     w20,[x1,#0x8]           ; Set timer to 10 MHz

; Access must always be in this order to ensure that carry propagation between the words is correctly handled
; Both must be written for changes to take effect, the least significant bits can be read at any time
ldr     w20,[x1,#0x1c]          ; Least significant 32 bit of free running timer
ldr     w21,[x1,#0x20]          ; Most significant 32 bit of free running timer

GPIO

Pinout

LAN9514 Hi-Speed USB 2.0 hub and 10/100 Ethernet controller

Links

BCM43438 wifi/Bluetooth

Links

Bare metal programming

Boot sequence

  1. Boots from internal ROM
  2. Reads the sd card and looks for additional gpu specific boot files bootcode.bin and start.elf in the root dir of the first FAT32 partition
  3. Loads config.txt where you can change the ARM clock frequency, or change the address where to load kernel.img, and many others
  4. Loads the ARM boot binary
    1. Check for kernel8.img and if found load it and boot in 64bit mode. Default address 0x00000000, SP does not contain a valid address.
    2. If not found check for kernel8-32.img and if found load it and boot in 32bit mode
    3. If not found check for kernel7.img and if found load it and boot in 32bit mode
    4. If not found check for kernel.img and if found load it and boot in 32bit mode
  1. Releases reset on the ARM so that it runs from the address where the code was loaded

The memory is split between the GPU and the ARM, I believe the default is to split the memory in half. And there are ways to change that split (to give the ARM more)(using config.txt).

From the ARMs perspective the kernel.img file is loaded, by default, to address 0x8000.

Boot failure

Firmware since 20th October 2012, the green LED flashes mean:

  • 3 flashes: start.elf not found
  • 4 flashes: start.elf not launched
  • 7 flashes: kernel.img not found
  • 8 flashes: SDRAM not recognised. You need newer start.elf firmware.

Memory access

  • Software accessing RAM directly must use physical addresses (based at 0x00000000).
  • Software accessing RAM using the DMA engines must use bus addresses (based at 0xC0000000).
  • Place a memory write barrier before the first write to a peripheral.
  • Place a memory read barrier after the last read of a peripheral.
  • If an interrupt routine reads from a peripheral the routine should start with a memory read barrier.
  • If an interrupt routine writes to a peripheral the routine should end with a memory write barrier.
  • 64 bit data should be aligned at 64 bit boundaries in 64 bit in normal cases. The stack pointer (SP) must be aligned at 128 bit boundaries, by using immediate offset it is possible to use addresses that have finer alignment.

Assembly language AArch64

Useful assembly snippets.

Send 3 cores to sleep to save power and avoid concurrency problems

        mrs     x0,MPIDR_EL1    ; Multiprocessor Affinity Register (MPIDR)
        ands    x0,x0,#3        ; CPU ID (Bits 0..1)
        b.ne    sleep
        ... Main code running on core 0 only
        
sleep:  wfe                     ; Wait for event, this causes the core to sleep until an event is raised
        b       sleep


Enable L1 data and instruction caches

       mrs     x0, s3_1_c15_c2_1
       orr     x0, x0, #(0x1 shl 6)    ; The smp bit (Symmetric Multiprocessing) Required for cache coherency
       msr     s3_1_c15_c2_1, x0
       
       mrs     x0, sctlr_el3
       orr     x0, x0, #(0x1 shl 2)    ; The c bit (Data cache)
       orr     x0, x0, #(0x1 shl 12)   ; The i bit (Instruction cache)
       ;orr     x0, x0, #0x1           ; The m bit (MMU) Requires translation table to be set up
       msr     sctlr_el3, x0
       dsb sy
       isb

C language

  • armclang --target=aarch64-arm-none-eabi -mcpu=cortex-a53 test.c

Links