# CompSciWeek6

## Contents

- Beginning Python - skim. chapters 8-14 (use as reference material)
- see expecially urlopen on p. 300, forks and threads on p. 304

- Beginning Python - Chapter 15 (Web services)

# Class 1: Effective Design

- Structured Code, Bioinformatics example from AOS Book
- Code Testing
- Source Code Versioning
- basic git

# Class 2: Using HPC Resources

- Accessing binaries and libraries, using modules
- Using scratch space
- Submitting a job script
- Managing queued jobs
- Advanced scripting tips and tricks
- awk

# Homework 4 (Due Fri., Oct. 10)

Please email the completed homework with the subject line "SciComp HW4, (your name)"

- Write example functions that use the advanced function notation from Beginning Python, Ch. 6 (see especially the example on p. 124).
- f(arg=default): the function should do nothing if the function is called as f(), and it should call arg.set_price(12) if it is called as f(type("InvItem", (), {"set_price":(lambda a,b: b)})())
- f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]})
- f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007")

- Write an example python class to represent a general inventory item. It should store its own name, and must contain the following methods: getCount(), returning the (arbitrary, fixed) number of items in inventory, and getPrice(), which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object.
- The article "Working with Big Data in Bioinformatics" describes software that reads lots of small strings and increments some counters for each string. The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts. Describe which of those three categories you would place each of the following routines in, and why.
- A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays.
- A script that creates a plot of the k-mer counts in a subset of the data.
- A function reading and parsing files containing genomic sequence data.
- A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.)

- Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code:
- "We expected the highest traffic to be in the k-mer counting logic."
- "Redundant calls to the toupper function were present in the highest traffic regions of the code."
- "Input of genomic reads was performed line-by-line and on demand and without any readahead tuning."
- "A copy-by-value of the genomic read struct [was] performed for every parsed and valid genomic read."

# Codes

Power function with logarithmic run time in n (linear in the *size* of n)
<source lang="python">
def pow(x, n):
Returns the number x raised to the integer power, n.

>>> pow(2, 4) 16 >>> pow(3, 2) 9 >>> pow(5, 0) 1

complexity = O(log n) = O(m), where m = # digits in n if n < 1: return 1 # correct for n=0 elif n == 1: return x elif n % 2 == 0: hp = pow(x, n/2) return hp*hp else: # 3, 5, 7, ... hp = pow(x, (n-1)/2) return x*hp*hp </source>

Testing the last module using python's doctest: <source lang="python">

- !/usr/bin/env python

if __name__=="__main__":

import doctest, vector # assumes pow() is defined in vector doctest.testmod(vector)

</source>

Using the python-geocoder-0.2 interface to Google's web-API to get distances: <source lang="python"> from geocode.google import GoogleGeocoderClient from numpy import *

geocoder = GoogleGeocoderClient(False) # must specify sensor parameter explicitely

def to_xyz(phi, th):

c = cos(phi) return array([c*cos(th), c*sin(th), sin(phi)])

def to_polar(lat, lon): return (90-float(lat))*pi/180.0, float(lon)*pi/180.0

def dist(a, b): # distance in kilometers across a perfect sphere of radius 6370 km

return 6370*arccos(dot(to_xyz(*a), to_xyz(*b)))

def get_loc(name): result = geocoder.geocode(name) if result.is_success(): return to_polar(*result.get_location()) else: print "Geocoding failed" return (0.0, 0.0)

a = get_loc("Lowry Park Zoo") # spherical polar
b = get_loc("MOSI, Tampa, FL")

print dist(a, b) </source>