Review of ICS

“Intro to Computer Systems” is a CS course held by CMU with course number CS213. Its home page says:

The ICS course provides a programmer’s view of how computer systems execute programs, store information, and communicate. It enables students to become more effective programmers, especially in dealing with issues of performance, portability and robustness. It also serves as a foundation for courses on compilers, networks, operating systems, and computer architecture, where a deeper understanding of systems-level issues is required. Topics covered include: machine-level code and its generation by optimizing compilers, performance evaluation and optimization, computer arithmetic, memory organization and management, networking technology and protocols, and supporting concurrent computation.

I note down some of the good stuff in case I forgot sometime. But mostly just a review before final test, don’t expect too much LOL.

Video resource on bilibili is: Here (2015 Fall) or Here(2017 Fall). In case you need it.
Website of this course: https://www.cs.cmu.edu/~213/
Textbook PDF: Computer Systems: A Programmer’s Perspective(2rd edition),Chinese version; 3rd edition on amazon.
Lecture notes: https://www.cs.cmu.edu/~213/schedule.html
Test PDF: https://www.cs.cmu.edu/~213/exams.html, test(answer), Moc, NE
Review Slides(Chinese): review

Overview

Ex.1: Is $x^2 \geq 0​$ ?

  • Float’s: Yes!
  • Int’s: Not quiet…
    • 40000$\times​$40000=1600000000
    • 50000$\times​$50000=-179496729 (overflow)

Ex.2: Is (x+y)+z = x+(y+z) ?

  • Float’s:
    • (1e20+ -1e20) +2.14 = 3.14
    • 1e20+ (-1e20 +2.14) = 0 (lose precision)

Sometimes this kind of issue(corner cases) can be essential! (Even though most of time it doesn’t matter)

Get to know assembly(汇编).

Get to know memory matters.

Ex.3: (Memory referencing bugs)

1547154674891

Double type has 2 bytes.( refer to a[2] & a[3])

System can be vulnerable by this.

Ex.4. (Memory system performance)

1547155431530

Those two codes’ performance differs from each other as almost 10 times on speed.

Showing as follows: (See in the Memory-Cache chapter, called “Memory Mountain”)

1547155607657

A programmer ‘s perspective.

  • thing need to know for whom really do the coding

David talks about how influential this course is. (Even a story about a bookstore in Peking University)

MORE(Course Resources)

There are more topics taught in this class about network programing including several chapter.

Seems the course reschedule at Jan 15th. I can’t access the slides now. The note ends here. (Still can, use the schedule below instead)

Schedule

from: https://www.cs.cmu.edu/~213/schedule.html

15-213/18-213/15-513: Intro to Computer Systems, Fall 2018

Notes on links

  • pptx links are to Powerpoint versions of the lectures
  • pdf links are to Adobe Acrobat versions of the lectures
  • code links are to directories containing code used for class demonstrations
  • tar links are to archive files in TAR format. Use the tar command on a linux machine to unpack these
Lecture/Recitation
Recitation 1: No recitation—Semester starts with first lecture
Overview (pptx , pdf , code)
Bits, Bytes, & Integers I (pptx , pdf , code)
Recitation 2: No recitation—Labor Day / Linux Boot Camp (pdf)
Bits, Bytes, & Integers II (pptx , pdf , code)
Floating Point (pptx , pdf)
Recitation 3: Datalab and Data Representations (pdf , handout , solution)
Machine Prog: Basics (pptx , pdf)
Machine Prog: Control (pptx , pdf)
Recitation 4: Bomb Lab (pdf , pptx , handout)
Machine Prog: Procedures (catchup-pptx , catchup-pdf , pptx , pdf)
Machine Prog: Data (pptx , pdf)
Recitation 5: Attack Lab and Stacks (pdf , pptx , activity)
Machine Prog: Advanced (pptx , pdf , code)
Code Optimization (pptx , pdf , C Bootcamp slides pdf , pptx , activity)
Recitation 6: C Review (pdf , activity)
The Memory Hierarchy (pptx , pdf)
Cache Memories (pptx , pdf)
Recitation 7: Cache Lab and blocking (pptx , pdf)
Linking (pptx , pdf , code)
ECF: Exceptions & Processes (pptx , pdf , code)
7pm - 9pm Exam Review in Rashid Auditorium
Recitation 8: Exam Review (pptx , pdf)
ECF: Signals & Nonlocal Jumps (pptx , pdf , code)
System Level I/O (pptx , pdf , code)
Recitation 9: Shell lab, processes, signals, and I/O (pdf , pptx)
Virtual Memory: Concepts (pptx , pdf)
Virtual Memory: Systems (pptx , pdf)
Recitation 10: TSHLab and Virtual memory (pptx , pdf)
Dynamic Memory Allocation: Basic (pptx , pdf)
Dynamic Memory Allocation: Advanced (pptx , pdf)
Recitation 11: Malloc lab (Part I) (pptx , pdf , code)
Network Programming (Part I) (pptx , pdf , code)
Network Programming (Part II) (pptx , pdf , code)
7pm - 8pm Malloc Bootcamp in Rashid Auditorium (pdf)
Recitation 12: Malloc lab (Part II) (pptx , pdf , code)
Concurrent programming (pptx , pdf , code)
Synchronization: Basic (pptx , pdf , code)
Recitation 13: Proxy lab (pptx , pdf)
Synchronization: Advanced (pptx , pdf , code)
No lecture—Thanksgiving
Recitation 14: Synchronization (pptx , pdf)
Thread-Level Parallelism (pptx , pdf)
Future of Computing I (pptx , pdf)
Recitation 15: Exam review
Future of Computing II (pptx , pdf)
Final exam

Exam

https://www.cs.cmu.edu/~213/exams.html

Representing&Processing of Information

Bits

  • Binary: 11110001
  • Decimal: 241
  • Hexadecimal: 0xF1

Boolean Algebra

And(&&), Or(||), Not(~/!), Exclusive-Or / Xor(^)

Opearte on bit vectors: &, |

  • can be used to represent sets
    • 01101001 {0, 3, 5, 6}
    • 76543210 ———1 means the set contains the element, 0 otherwise
Operation Name Byte Set
& Intersection 01000001 {0, 6}
| Union 01111101 {0, 2, 3, 4, 5, 6}
^ Symmetric difference 00111100 {2, 3, 4, 5}
~/! Complement 10101010 {1, 3, 5, 7}

p && *p: avoids null pointer access

Shift Operators

  • Left shift: <<
  • Right shift: >>

There is a difference between logical shift and arithmetical shift, see in the following pic: The negative num’s arithmetical right shift will fill in 1s. (Others are 0s)

1547159998158

Intergers

Unsigned & Signed

  • Unsigned Values
    • $U_{Min} = 0$
      • 000…0
    • $U_{Max} = 2^w - 1 $
      • 111…1
  • Two’s Complement Values【求负:取反加一
    • $T_{Min} = -2^{w-1}​$
      • 100…0
    • $T_{Max} = 2^{w-1} -1$
      • 011…1
  • Other Values
    • Minus 1
      • 111…1

Here is a graph: (4 bits)

1547162047772

Notice:

Conversion Visualization

1547162367485

Casting(转换)

Principle: If there is a mix of unsigned and signed in single expression, signed values implicitly cast to unsigned.

1547167241637

Examples above are for W=32: TMIN=-2,147,483,648 ; TMAX=2,147,483,647.

1
2
3
for(i = n-1; i >= 0; i--)
do_sth(i);
//if i is unsigned, this loop will run forever!
1
2
3
for(i = n-1; i - sizeof(str)>= 0; i--)
do_sth(i);
//sizeof() return an unsigned value, this loop will crash very strangely!

Extension

Sign Extension

Principle: Add all 1s for the new blank.

1547168305282

Unsigned

Principle: Add all 0s for the new blank. (Similarly)

Truncation(截断)

Unsigned: mod operation, Just drop it.
Signed: similar to mod (sign can be converted in the process)

Arithmetic

Addition

Unsigned Addition

Drop the overflowed part. (But when computing use extra bits)

1547170397354

Two’s Complement Addition

Drop the overflowed part as well. (Will get the exact correct answer)

TAdd and UAdd have identical bit-level behavior.

1547170932559

Visualizing 2’s complement addition:

1547171039590

Multiplication

Unsigned multiplication

Still drop the overflowed part.

Signed multiplication

Drop the overflowed part. (sign can be converted in the process)

For negative numbers: drop the overflowed part will somehow still get right answer.

1101 -3 13(U)
1110 -4 14(U)
10110 6 6(U)

Cause the actual effective bits are not really overflowed. (1101=101, 1110=10)

Shift

Power-of-2 Multiply with Shift

u<<k gives u*2^k(both signed and unsigned)

Shift is much faster than multiplication.

Unsigned Power-of-2 Divide with Shift

Use logical shift.(0s)

Signed Power-of-2 Divide with Shift

Use arithmetical shift.

(>>)1010 -6
(>>)1101—->1110 -3—->-2
1110—->1111 -2—->-1

Trick: before you shift, you should add a bias.(For this example—->1)
But for c, it seems we don’t use bias. (右移向下舍入)

Summary

Why should I Use Unsigned?

In java there is no unsigned.

A new principle: Don’t use without understanding implications.

1547173603445

Counting down with Unsigned:

1547173911825

Finally, Why should I Use Unsigned?

  • Do Use When Performing Modular Arithmetic
    • Multiprecision arithmetic
  • Do Use When Using Bits to Represent Sets
    • Logical right shift, no sign extension
  • Do Use In System Programming
    • Bit masks, device commands,…

Representation in memory, pointers, strings

Byte-Oriented Memory Organization

1547174591516

BUG: SegmentFault.

Trick: $2^{10}=1024\approx 1000=10^{3}$.

Machine Words: Any given computer has a “Word Size”.(32bit, 64bit)

For 32bits: limits addresses to 4GB(2^32 bytes)
For 64bits: limits addresses to 18PB(2^64 bytes)

Word-Oriented Memory Organization

Addresses Specify Byte Locations.

1547175024211

Example Data Representations:

1547175059616

Byte Ordering(Endianness)

  • Big Endian: Sun (Oracle SPARC), PPC Mac, Internet
    • Least significant byte has highest address
  • Little Endian: x86, ARM processors running Android, iOS, and Linux
    • Least significant byte has lowest address

Ex:(Variable x has 4-byte value of 0x01234567, Address given by &x is 0x100)

1547175239609

Other representation can be see in the slides.


Puzzles: (32 bits for int)

1547175886966

For x > y ---> -x < y, note that there is a special occasion:

For(x|-x)>>31==-1, consider x=0.

Forx>>3==x/8, using the ceil function(round down)

  • for/, always map close to 0
  • for>>, always map with ceil function

Floating Point

Background

What is 1011.101(2) ?

1547200593806

Representation:

  • Bits to right of “binary point” represent fractional powers of 2
  • Represents rational number

Here are some examples:

Value Representation
5 3/4 101.11
2 7/8 10.111
1 7/16 1.0111

Limitation:

  • Can only exactly represent numbers of the form x/2^k
    • Other rational numbers have repeating bit representations
  • Just one setting of binary point within the w
    • Limited range of numbers (very small values? very large ( ——> floating point)

IEEE standard

Numerical Form:

  • Sign bit s determines whether number is negative or positive
  • Significand M normally a fractional value in range [1.0, 2.0).
  • Exponent E weights value by power of two

1547201117356

There are mainly two types of floats:

1547201287683

Three “kinds” of floating point numbers:

1547201477780

“Normalized” Values: (Very large and small numbers)

  • When: exp ≠ 000…0 and exp ≠ 111…1
  • Exponent coded as a biased value: E = exp – Bias (To represent small float numbers, just like 2’s complement)
    • exp: unsigned value of exp field (Also: exp = E + Bias)
    • Bias = $2^{k-1} - 1$, where k is number of exponent bits
      • Single precision: 127 (exp: 1…254, E: -126…127
      • ▪ Double precision: 1023 (exp: 1…2046, E: -1022…1023)
  • Significand coded with implied leading 1: M = 1.xxx…x
    • xxx…x: bits of frac field
      • Minimum when frac=000…0 (M = 1.0)
      • Maximum when frac=111…1 (M = 2.0 – ε)
    • Get extra leading bit for “free”

Ex:

1547202141338

Note: 0<=Exp<=255, -127<=E<=128

Denormalized Values: ( represent values in $[0,1)$, especially 0)

  • Condition: exp = 000…0
  • Exponent value: E = 1 – Bias=$2-2^{k-1}$ =-126 (instead of exp – Bias) (why?)
  • Significand coded with implied leading 0: M = 0.xxx…x
    • xxx…x: bits of frac
  • Case
    • exp = 000…0, frac = 000…0
      • Represents zero value
      • Note distinct values: +0 and –0 (why? - signed)
    • exp = 000…0, frac ≠ 000…0
      • Numbers closest to 0.0
      • Equispaced (mean distribution)

Special Values: (Too Large to represent)

  • Condition: exp = 111…1
  • Case: exp = 111…1, frac = 000…0
    • Represents value $\infty$ (infinity)
    • Operation that overflows
    • Both positive and negative
    • E.g., $1.0/0.0 = +\infty ,−1.0/−0.0 = + \infty, 1.0/−0.0 = − \infty$ (Not number)
  • Case: exp = 111…1, frac ≠ 000…0
    • Not-a-Number (NaN)
    • Represents case when no numeric value can be determined
    • E.g., sqrt(–1),

Ex: (Do this and you will get the idea)

1547216351183

Example and properties

Tiny Floating Point Example:

1547217197184

8-bit Floating Point Representation
▪ the sign bit is in the most significant bit
▪ the next four bits are the exp, with a bias of 7
▪ the last three bits are the frac

As you can see as follows, look at the red arrows. The mechanism actually give a smooth change from De. to Nor.

1547217437064

the distribution gets denser toward zero: (1s-3e-2f)

1547217624711

6-bit IEEE-like format (close-up view): (1s-3e-2f)

1547217776981

Operations

Floating Point Operations: Basic Idea
▪ First compute exact result
▪ Make it fit into desired precision
▪ Possibly overflow if exponent too large
▪ Possibly round to fit into frac

Rounding(取整)

Rounding Modes: (illustrate with $ rounding)

1547218082911

Round to nearest, but if half-way in-between then round to nearest even(偶)

Ex.

1547218728343

Addition

1547219148576

Multiplication

1547219129898

Lecture note: here.

Floating point in C

  • C Guarantees Two Levels
    • float single precision
    • double double precision
  • Conversions/Casting
    • Casting between int, float, and double changes bit representation
    • double/float → int
      • Truncates fractional part
      • Like rounding toward zero
      • Not defined when out of range or NaN: Generally sets to TMin
    • int → double
      • Exact conversion, as long as int has ≤ 53 bit word size
    • int → float
      • Will round according to rounding mode

Floating Point Puzzles:

1547219829755

Summary

1547220077346

Machine Programming

Basics

Intel x86 Processors : Complex instruction set computer (CISC).

The class will mainly use gcc complier.

2018 State of the Art: (Coffee Lake)

1547220973797

History blablabla…

C, assembly, machine code

Definitions

  • Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly code
    • Examples: instruction set specification, registers
    • Machine Code: The byte-level programs that a processor executes
    • Assembly Code: A text representation of machine code
  • Microarchitecture: Implementation of the architecture
    • Examples: cache sizes and core frequency
  • Example ISAs:
    ▪ Intel: x86, IA32, Itanium, x86-64
    ▪ ARM: Used in almost all mobile phones
    ▪ RISC V: New open-source ISA

Assembly/Machine Code View:

1547221546090

Turning C into Object Code

1547221638010

Ex:

1
gcc -Og -S sum.c

Use gcc to comply
-S: stop for the 1st step
-Og: what kind of optimization on code you choose (for debugging)
-O1: a lot of optimization( pretty hard to understand)

Data Types

C Data Type assembly suffix byte
char byte b 1
short word w 2
int (two) word l 4
char * (two) word l 4
float single precision s 4
double bio-precision l 8

Disassembly(反汇编)

Ex:

1
2
gcc -Og -S sum.c -o sum
objdump -d sum > sum.d //disassemly

sum is a binary file(to disassembly the origin function)

Registers

x86-64 Integer Registers:

1547223757301

%rax(%eax) reserves the function’s return value.
$eip stands for instruction register and can not be changed. (Not in the picture)

%rsp(%esp): stack pointer.
%rbp(%ebp): base pointer.

Note: (the stack’s in memory top has lower address) (See more in the Procedures)

1
2
3
4
5
6
pushl %ebp  #equal to
subl $4,%esp
movl %ebp,(%esp)
popl %ebp #equal to
movl (%esp),%ebp
addl $4,%esp

IA32 Registers:

1547223798589

Just a legacy thing.

Move

1
movq Source,Dest

1547224139238

Operand types showing above,

Ex: (Understanding Swap())

1547224429434

1
2
3
4
5
6
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret

Addressing Modes

Simple Memory Addressing Modes

  • Normal (R) Mem[Reg[R]]
    • Register R specifies memory address
      Aha! Pointer dereferencing in C
1
movq (%rcx),%rax
  • Displacement D(R) Mem[Reg[R]+D]
    • Register R specifies start of memory region
      Constant displacement D specifies offset
1
movq 8(%rbp),%rdx

Complete Memory Addressing Modes

  • Most General Form D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
    • ▪ D: Constant “displacement” 1, 2, or 4 bytes
      ▪ Rb: Base register: Any of 16 integer registers
      ▪ Ri: Index register: Any, except for %rsp
      ▪ S: Scale: 1, 2, 4, or 8 (why these numbers?)
  • Special Cases
    • (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
      D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
      (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

Ex. (Addressing Modes)

1547224793696

Arithmetic & logical operations

Address Computation Instruction

1
leaq Src, Dst

Src is address mode expression
Set Dst to address denoted by expression

Uses: (Actually similar to move op)
▪ Computing addresses without a memory reference
▪ E.g., translation of p = &x[i];
▪ Computing arithmetic expressions of the form x + k*y
▪ k = 1, 2, 4, or 8

1547224928978

Difference from mov: (example)

1
2
3
4
5
6
7
mov 1(%eax),%ebx
# b = *(a + 1)
lea 1(%eax),%ebx
# b = a + 1
# equal to add & mov without reference
--------------------
# Consider %eax actually reserve an address, then lea load an address instead of load address's reference value in memory like mov

Some Arithmetic Operations

Two Operand Instructions: (src=source)

1547224970370

Watch out for argument order! Src,Dest
(Warning: Intel docs use “op Dest,Src”)
No distinction between signed and unsigned int (why?)

One Operand Instructions: (See book for more instructions)

1547225374772

Ex.

1547225425648

Control

Control: Condition codes

1547226503605

Compare Instruction:

1
cmpq Src2,Src1 # Src1 - Src2

Test Instruction:

1
testq Src2,Src1 # Src1 & Src2

Those instructions doesn’t change the value of registers.

SetX Instructions:

1547227143624

Help to change bit level values(t low-order byte of destination) for arithmetic.

Ex.

1547227290214

Note: (Intel x86 machine sets the upper bits to zeros implicitly)

1547227329427

Conditional branches

Jumping

1
jmp .L1 #address

1547227524003

Ex.

1547227649631

It’s behavior is just like to “goto” statement in C.

Condition move: (if the instruction is simple, calculate it first and then judge)

1547274552015

Loops

Do-While” Loop Example: (it’s not often used)

1547274736995

While Loop Example: (there is an initial test)

1547275528661

“For” Loop Form: (can be converted into other 2 kinds of loops)

1547275625693

Switch Statements

Ex. (use “break” to end early)

1547275962727

Jump Table Structure: (Avoid to step though all the codes)

1547276298488

Ex:

1547276450387

ja: jump above. (x>6 and x<0 will jump to default case)
jmp: goto jump table

.quad: state that need a 8byte value here.

Jump table:

1547276715940

Code Blocks (x == 2, x == 3): (Note that the block doesn’t have some orders)

1547277014211

Code Blocks (x == 5, x == 6, default):

1547277085015

Summary:

1547277972355

Procedures

Function, method, procedure….

Mechanisms

1547279264175

Stack Structure

x86-64 Stack:

1547279413442

Note: the direction are controversial maybe.
(The memory doesn’t change, only the value of %rsp)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#the %ebp contains a value to save
pushl %ebp #equal to
subl $4,%esp
movl %ebp,(%esp)
popl %ebp #equal to
movl (%esp),%ebp
addl $4,%esp
----------------------
pushl %rbp #equal to
subl $8,%rsp
movl %rbp,(%rsp)
popl %rbp #equal to
movl (%rsp),%rbp
addl $8,%rsp
1
2
3
leave #equal to
movl %ebp,%esp
popl %ebp

Ex.

1547279843850

A example: Here. (Chinese)

Calling Conventions

Passing control

1
callq address <func_name>

1547280032115

Passing data

1547280231780

Ex.

1547280313959

Managing local data

Call Chain:

1547280660054

Stack Frames: (%rsp)

1547280713807

Example see in the slides.(Also the Illustration of Recursion part)

Data

silde.

Arrays

Easy.

Ex.

1547283990640

Structures

1547286561100

Alignment: (对齐)

1547286715157

Alignment is depending on the next data type.

1547287192695

Save space:

1547287260503

Floating Point

Similarly.

Advanced Topics

slide.

Memory Layout

1547288158389

Buffer Overflow

when exceeding the memory size allocated for an array.

String Library Code:

1547319959091

Ex; (Buffer Overflow Disassembly)

1547320185281

Things can go wrong without a crash.

For string, note that there is a byte order.

Avoid Overflow Vulnerabilities in Code:

1
fgets(buf, 4, stdin);

System-Level Protections can help.

  • Randomized stack offsets

Nonexecutable code segments.

Stack Canaries can help.

  • Idea
    ▪ Place special value (“canary”) on stack just beyond buffer
    ▪ Check for corruption before exiting function
  • GCC Implementation
    ▪ -fstack-protector
    ▪ Now the default (disabled earlier)

Unions

1547322998178


Byte Ordering Example: (only ordered inside a single data)

1547323271495

1547323296428

1547323287363

Code Optimization

slides.

Generally Useful Optimizations

Optimizations that you or the compiler should do regardless of processor / compiler.

Code Motion

1547323914397

Reduction in Strength

1547324007937

Share Common Subexpressions

1547324041845

Optimization Example: Bubblesort

1547324174042

Optimization Blockers

Procedure calls

1547324425190

Note the function strlen(), it will be running the whole time and make the program perform badly.($O(n^2)$)

1547324625791

Memory aliasing

1547324933006

Removing Aliasing: (don’t use reference unless you have to)

1547325042701

Exploiting Instruction-Level Parallelism

1547325479644

Cycles Per Element (CPE)

1547325731482

Benchmark Performance:

1547325930622

Loop Unrolling

1547327034519

Multiple Accumulators

Loop Unrolling with Separate Accumulators (2x2):

1547327354252

Summary:

1547331350241

The Memory Hierarchy

slides.

Random-Access Memory(RAM).

RAM comes in two varieties:
▪ SRAM (Static RAM)
▪ DRAM (Dynamic RAM)

Disk.

Solid State Disks(SSD).

Bus.(总线)

1547333849790

Locality of reference

1547333277107

Caching in the memory hierarchy

1547334354630

General Cache Concepts:

See in the slides.

3 Types of Cache Misses:

1547335369033

For conflict miss, each block represent one specific next level cache block. (Can’t use to save other values)

Cache Memories

slides.

1547335909700


Cache organization and operation

1547335981511

1547335994312

Direct Mapped Cache (E = 1): (One line per set. Assume: cache block size B=8 bytes)

1547336274671

Locate set
Check if any line in sethas matching tag
Yes + line valid: hit
Locate data starting at offset

If tag doesn’t match (= miss): old line is evicted and replaced

Ex. (Direct-Mapped Cache Simulation)

1547337293519

E-way Set Associative Cache (Here: E = 2):

1547338153960

Ex. (2-Way Set Associative Cache Simulation)

1547338419283

What about writes?

1547339208689

Performance impact of caches

The memory mountain

Read throughput (read bandwidth)
▪ Number of bytes read from memory per second (MB/s)
Memory mountain: Measured read throughput as a function of spatial and temporal locality.
▪ Compact way to characterize memory system performance.

1547339979979

Rearranging loops to improve spatial locality

Ex. (Matrix Multiply Performance)

1547340307928

1547340272596

Using blocking to improve temporal locality

See in the slides.

Linking

slides.

Ex. (Example C Program)

1547340841183

Programs are translated and linked using a compiler driver:

1
2
linux> gcc -Og -o prog main.c sum.c
linux> ./prog

1547340897117

Why Linkers?

  • Reason 1: Modularity
  • Reason 2: Efficiency (time/space)

What Do Linkers Do?

  • Step 1: Symbol resolution (During symbol resolution step, the linker associates each symbol reference
    with exactly one symbol definition.)
  • Step 2: Relocation (Merges separate code and data sections into single sections)

Three Kinds of Object Files (Modules)

  • Relocatable object file (.o file)
  • Executable object file (a.out file)
  • Shared object file (.so file)

Linker Symbols

1547341535201

just like: public, friend, and private.

Program symbols are either strong or weak:

  • Strong: procedures and initialized globals
  • Weak: uninitialized globals
    Or ones declared with specifier extern

1547372123409

Rule 1: Multiple strong symbols are not allowed

  • Each item can be defined only once
    Otherwise: Linker error

Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol

  • References to the weak symbol resolve to the strong symbol

Rule 3: If there are multiple weak symbols, pick an arbitrary
one

  • Can override this with gcc –fno-common

Ex. (Linker Puzzles)

1547372584466

Global Variables: (caution)

  • Avoid if you can
  • Otherwise
    § Use static if you can
    § Initialize if you define a global variable
    § Use extern if you reference an external global variable
     § Treated as weak symbol
     § But also causes linker error if not defined in some file
    

关于text段、data段和bss段

Shared Library (Modern Solution)

地址重定位:静态重定位和动态重定位

Case study: Library interpositioning

1547376541373

1547376555472

ECF(Exceptional Control Flow)

Exceptions & Processes

Exceptional Control Flow

1547377173686

Exceptions

1547377211891

Interrupts(Asynchronous Exceptions)

1547377607727

Synchronous Exceptions

1547377563056

Processes

1547378100305

Process Control

Obtaining Process IDs

pid_t getpid(void)

  • Returns PID of current process

pid_t getppid(void)

  • Returns PID of parent process

Creating and Terminating Processes

void exit(int status)

terminated.

Convention: normal return status is 0, nonzero on error

int fork(void)

Returns 0 to the child process, child’s PID to parent process。(called once, return twice

Ex. (can’t predict the exact return value of fork)

1547381165003

Two consecutive fork:

1547391612972

Nested forks in parent: (Similar)

1547391661619

1547391679091

More exercise: Sample, Modor

int wait(int *child_status)

return child’s PID if succeed, -1 otherwise.

Ex.

1547393088942

Ex. (fork + wait)

1547392851262

Note: (The processes will run parallelly unless some of them have to wait)

1547395602746

int execve(char *filename, char *argv[], char *envp[])

  • never returns.

Signals & Nonlocal Jumps

slides.

setjmp

  • use once, return twice

longjmp

  • use once and never return

Virtual Memory

Concepts

slides.

Address spaces

1547397352855

Why Virtual Memory (VM)?

  • Uses main memory efficiently
  • Simplifies memory management
  • Isolates address spaces

VM as a tool for caching

1547397695292

Enabling Data Structure: Page Table

A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages.
▪ Per-process kernel data structure in DRAM

1547398071042

Page hit: reference to VM word that is in physical memory (DRAM cache hit)
Page fault: reference to VM word that is not in physical memory (DRAM cache miss)
Handling Page Fault: Page miss causes page fault (an exception)

Locality to the Rescue Again!

1547398461378

VM as a tool for memory management

Key idea: each process has its own virtual address space

1547399758879

VM as a tool for memory protection

1547400232218

Address translation

1547400371064

Address Translation: Page Hit

1547400749761

Address Translation: Page Fault

1547400731481


Integrating VM and Cache

1547400842787

VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address.

1547401434276

Multi-Level Page Tables

1547401541948

A Two-Level Page Table Hierarchy

1547401630774

Translating with a k-level Page Table

1547401741126

Systems

1547401893430

Case study: Core i7/Linux memory system

None.

Some page were marked to be protected by core so that the user can’t alter.

There are L1~L4 page tables.

Trick: VPO=PPO=Physical Address. (The lower bits of Block address)

Memory mapping

The address translation is always running.

Dynamic Memory Allocation

Basic

1547632217288

void* malloc(size_t size)

  • get a block by given size

void free(void *p)

  • free the block at the address

calloc: version of malloc that initializes block to 0

realloc: change the size of it

sbrk: control allocators to grow or shrink the heap

Throughout: number of complete requests per unit time.

5000 malloc + 5000 free in 10 sec, then throughout will be 1000 operation/sec.

Peak Memory Utilization:

  • aggregate payload $P_k​$.

  • Current heap size $H_k$

  • $Uk=max{i\leq k}P_i/H_k$

Can cause by fragmentation.(internal or external)

1547633458529

Advanced

Explicit free lists

Keeping Track of Free Blocks

1547632384923

Explicit free lists:

1547632763444

Freeing With Explicit Free Lists:

1547632870380

Often use BBST to manage the allocated memory.

Explicit List Summary:

1547633702762

Segregated free lists

Each size class of blocks has its own free list.

1547633841902

1547633866174

1547633877113

Garbage collection

Garbage collection :automatic reclamation of heap-allocated storage—application never has to explicitly free memory.

Common in many dynamic languages:

▪Python, Ruby, Java, Perl, ML, Lisp, Mathematica

The tech is still ongoing.

1547634384794

Graph:

1547634341801

Mark and Sweep Collecting:

1547634660319

Assumptions For a Simple Implementation:

1547634676657

Mark and Sweep (cont.)

1547634637283

Dereferencing Bad Pointers

The classic scanf bug:

1547634921046

Reading Uninitialized Memory

1547635004089

Overwriting Memory

1547635074780

1547635083354

1547635094290

1547635106717

1547635130333

Referencing Nonexistent Variables

1547635148682

Freeing Blocks Multiple Times

1547635163203

Referencing Freed Blocks

1547635195448

Failing to Free Blocks (Memory Leaks)

1547635216062

1547635226516

Ex.

1547634870116

1547634883889

System-Level I/O

slides.

1547636269869

Unix I/O

1547636283880

Unix I/O Overview:

1547636304283

File Types

1547636442697

Regular Files

1547636488286

Windows type \r\n actually relates to the usage of typewriter.

Pathname

1547637170160

Opening Files

1547637196057

Closing Files

1547637230296

Reading Files

1547637274563

Writing Files

1547637288790

Ex. (sample)

1547637342089

Metadata. sharing, and redirection

1547637499977

1547638259903

Ex. (Example of Accessing File Metadata)

1547639415439

File Sharing

1547638324632

1547638404001

I/O Redirection

1547638494620

Standard I/O

Functions:

1547639071441

Std. I/O Streams:

1547639102970

Buffering in Standard I/O:

1547639217650

1547639252437

RIO(robust I/O) package

1547637720207

Buffered I/O: Implementation:

1547637828298

Closing remarks

1547639020855

For more information:

1547639383577

Network Programing

slide1, slide2.

A Client-Server Transaction

1547639774401

Computer Networks

1547639820521

1547639834807

Structure

Lowest Level: Ethernet Segment

1547640042340

Broadcast mechanism.

Next Level: Bridged Ethernet Segment

1547640030470

LAN: Local area network.
WLAN: Wireless local area network.

Next Level: internets

1547640194691

1547640020500

Transferring internet Data Via Encapsulation:

1547640403914

Global IP Internet

1547640557961

Internet Domain Name

1547641215724

Domain Naming System (DNS)

1547641252464

can be multi-multi mapping.

Sockets

1547642020101

1547642053038

1547642145169

1547642156153

Socket Interface

1547642365345

The advanced topic can be seen in the video and related slides.

Concurrent programming

slides.

The lecturer always says “good to see you”.

1547646573504

Situation Examples

1547646713486

Data Race:

1547646641411

Deadlock:

1547646688116

Starvation:

1547646755864

1547646950976

Iterative Servers

1547647075791

Iterative servers process one request at a time.

1547647023405

Fundamental Flaw of Iterative Servers: (example)

1547647298201

Writing Concurrent Servers

Approaches for Writing Concurrent Servers:

1547647413898

Approach 1: Process-based Servers(进程)

1547647627512

Approach 2: Event-based Servers

1547649281259

I/O Multiplexed Event Processing:

1547649302626

Approach 3: Thread-based Servers(线程)

Very similar to approach #1 (process-based)

…but using threads instead of processes

1547649366445

1547649399772

1547650144510

1547650155901

Summary: Approaches to Concurrency

1547650200705

Synchronization

slides-basic, slides-advanced.

Threads review

1547650418667

1547650457857

Sharing

Ex. (Example Program to Illustrate Sharing)

1547650532298

Mapping Variable Instances to Memory:

1547650564204

Ex.

1547650619028

Ex. (Shared Variable Analysis)

1547650823729

1547652937800

Mutual exclusion

1547652958297

Semaphores

1547652991943

Summary:

1547653037579

Using semaphores to schedule shared resources

Producer-consumer problem

1547653614225

Note: limited buffer

Maintain two semaphores: full + empty

1547653662469

Circular Buffer

1547654854691

1547654864042

Readers-writers problem

1547657321523

Other concurrency issues

Thread safety

1547657606023

Races

1547657467677

Deadlocks

1547657507570

Avoid:

1547657532124

Thread-Level Parallelism

slides.

Longer term perspective. Think bigger!

Today

1547657871360

1547658046864

Just see the video.

Future of Computing

slides: part1, part2.

Moore’s Law

1547665746325


1547660408168

The blue ones are actually for celphones.
The general trend is produced by regression.

It’s a prediction rather than a law.

1547666287044

Technological singularity:

1547666579930

1547667467170

1547668381602

Ex. (3D structure)

1547668475898

1547668534807

High-Performance Computing

1547668699471

1547668729420

What’s So Special About Big Learning?

This part is available only on 2017 Fall version or after.

Please just see the slides and watch the video.

GraphLab, GraphLab-Guide_Chinese.

1547669478016

1547669717022


1547669625241

0%