Tuesday, November 22, 2005

Python notes


  • os.environ.get('USER')
  • os.path.isfile(filename)
  • float(a), abs(a), sqrt(a)
  • lists [mutable], tuples(immutable),
    • a[0:2] = [], len(a), a.append('element')
    • a[1:1] = ['extra', 'fields', 'inserted', 'after 0', 'before 1']
    • a.append(x) <=> a[len(a):] = [x]
    • a.insert(i,x) <=> a[i:i] = [x]
    • a.remove(x) # remove the first elt==x, err if not found
    • a.pop(i), a.index(x), a.count(x), a.sort(), a.reverse()
    • del a[i], a = [0] * 100
    • Python2.3:
      • sum(a, start) = start + sum(a)
  • for i in range(2,10): # (use xrange for long sequences)
  • for x in a[:]: # would iterate over a copy of a. needed if changes are expected for a
  • if ok in ('y', 'ye', 'yes'): return True
Functions:
  • Default function variable evaluated only once - hence class variable
    • def f(a, L=[]):
      • L.append(a), return
  • Another function 'class' variable - \lambda form:
    • def make_incrementor(n):
      • return lambda x: x + n
  • def f(x, *arguments, **keywords):
    • #arguments is a list of arguments, followed by a map of values
    • keys = keywords.keys(), keys.sort()
    • for kw in keys: print kw, ':', keywords[kw]
Functional Programming:
  • filter(bool_function, sequence)
  • map(function, sequence)
  • reduce(function, sequence[, init_value])
  • [x * y for x in range(10) if x > 3 for y in range(3)]
Sets (of unique elements):
  • A=set([...]), B=set([...])
  • A - B (difference), A | B (union), A & B (and), A ^ B (xor)
Dictionaries: {x: y,...}
  • a.has_key(x), a.keys(), a[x], del a[x], len(a)
  • a.get(x[, y]) # return a[x] if x in a, o.w. return y as default
  • for x, y in a.iteritems(): # also use iterkeys() and itervalues()
  • dict([(x, x * x) for x in range(10)])
  • from Python2.3:
    • dict.fromkeys(keys, default_value)
    • a.pop(key) # returns a[key] and deletes it from a
Looping:
  • for i, v in enumerate(['tic', 'tac', 'toe']): (2.3)
  • for q, a in zip(questions, answers):
  • for x in zip(*rows): # iterates in parallel through elements of rows - yielding columns; rows (and the number of column elements) are truncated to the length of the shortest one.
  • for f in sorted(set([...])):
Comparisons:
  • a is b #check if the same object (a==b) checks values, for mutables i.e.
  • a"<"b==c #check if a"<"b and b==c
  • sequences compared mostly naturally, with 'a'<'ab'
  • comparing objects of different types illegal, but returns some result
Classes:
  • Always use dir(object) and help(object) to determine what attributes and functions it has
  • self is not a special word, although is used mos often.
  • class Derived(Base):
    • def __init__(self,blah,baseblah):
      • Base.__init__(self,baseblah)
      • self.blah = blah
Style comments:
  • from foo import bar, bar should be a package, never a variable
  • use exceptions rather than log+exit
  • use underscore_name rather than camelCase
  • single space within class, two spaces normally
  • use files as iterators rather than load
Interesting modules:
  • csv, timeit, bz2, datetime, ossaudiodev, pickle, cPickle, random, itertools

Saturday, October 15, 2005

Regularni izrazi

Generalno postoje dve implementacije regularnih izraza, NFA (regex usmereni) i DFA (tekstualno usmereni). Najpopularnije implementacije su regex usmerene izmedju ostalog i zato sto dozvoljavaju lenja poredjenja i reference unazad. Ovo stvarno nece da radi... Ukratko, ovo je moja stranica da se podsetim sintakse. Zbog toga sto postoje razlicite implementacije, ove definicije su onako, od prilike i izraze treba prvo probati.

  • [^ ] - negates classes inside
  • \d - digit, i.e. [0-9]; \D=[^\d] - negated \d
  • \w - word characters, usually [A-Za-z] but maybe more, e.g. '_\d'; \W=[^\w]
  • \s - whitespace characters, e.g. \s=[ \t\r\n]; \S=[^\s]
  • . - matches all except new line. So on Unix, .=[^\n], Windows .=[^\r\n], Mac ... who knows? new line used to be marked with \r, but OS X is Unix, so I guess it's just \n now.
  • {min,max} - specifies number of repetitions for the previous expression (char or ())
  • + - repeat 1 or more times, greedy; +={1,}
  • * - repeat 0 or more times, greedy; *={0,}
  • Anchors:
    • ^ - beginning, $ - end of the line
    • \A - beginning, \Z - end of the string (possibly multiple lines)
    • \b - word boundary
  • (...)? - optional inside (), greedy
  • ? - can turn greedy into lazy, e.g. +?, *?, ??
  • Backreferences:
    • ( ) - provide the reference anchor
    • (?: ) - removes the reference from ( )
    • \1, \2 ... - the usual access method for backreferences; e.g. for capturing HTML tag one can use something like: <([A-Z][A-Z0-9]*)[^>]*>.*?
    • Python capturing group: (?P<name>group), referenced with \1 or (?P=name)
    • Atomic grouping: (?>expression) there can not be backreferences inside expression.