CompSciWeek6

From Predictive Chemistry
Revision as of 15:01, 1 October 2014 by David M. Rogers (talk | contribs) (Codes)

Jump to: navigation, search

Reading (shared with Week 7)

  • Beginning Python - skim. chapters 8-14 (use as reference material)
    • see expecially urlopen on p. 300, forks and threads on p. 304
  • Beginning Python - Chapter 15 (Web services)

Class 1: Effective Design

  • Structured Code, Bioinformatics example from AOS Book
  • Code Testing
  • Source Code Versioning
    • basic git

Class 2: Using HPC Resources

  • Accessing binaries and libraries, using modules
  • Using scratch space
  • Submitting a job script
  • Managing queued jobs
  • Advanced scripting tips and tricks
    • awk

Homework 4 (Due Fri., Oct. 10)

Please email the completed homework with the subject line "SciComp HW4, (your name)"

  1. Write example functions that use the advanced function notation from Beginning Python, Ch. 6 (see especially the example on p. 124).
    1. f(arg=default): the function should do nothing if the function is called as f(), and it should call arg.set_price(12) if it is called as f(type("InvItem", (), {"set_price":(lambda a,b: b)})())
    2. f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]})
    3. f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007")
  2. Write an example python class to represent a general inventory item. It should store its own name, and must contain the following methods: getCount(), returning the (arbitrary, fixed) number of items in inventory, and getPrice(), which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object.
  3. The article "Working with Big Data in Bioinformatics" describes software that reads lots of small strings and increments some counters for each string. The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts. Describe which of those three categories you would place each of the following routines in, and why.
    1. A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays.
    2. A script that creates a plot of the k-mer counts in a subset of the data.
    3. A function reading and parsing files containing genomic sequence data.
    4. A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.)
  4. Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code:
    1. "We expected the highest traffic to be in the k-mer counting logic."
    2. "Redundant calls to the toupper function were present in the highest traffic regions of the code."
    3. "Input of genomic reads was performed line-by-line and on demand and without any readahead tuning."
    4. "A copy-by-value of the genomic read struct [was] performed for every parsed and valid genomic read."

Codes

Power function with logarithmic run time in n (linear in the size of n) <source lang="python"> def pow(x, n): Returns the number x raised to the integer power, n.

>>> pow(2, 4) 16 >>> pow(3, 2) 9 >>> pow(5, 0) 1

complexity = O(log n) = O(m), where m = # digits in n if n < 1: return 1 # correct for n=0 elif n == 1: return x elif n % 2 == 0: hp = pow(x, n/2) return hp*hp else: # 3, 5, 7, ... hp = pow(x, (n-1)/2) return x*hp*hp </source>

Using the python-geocoder-0.2 interface to Google's web-API to get distances: <source lang="python"> from geocode.google import GoogleGeocoderClient from numpy import *

geocoder = GoogleGeocoderClient(False) # must specify sensor parameter explicitely

def to_xyz(phi, th):

    c = cos(phi)
    return array([c*cos(th), c*sin(th), sin(phi)])

def to_polar(lat, lon): return (90-float(lat))*pi/180.0, float(lon)*pi/180.0

def dist(a, b): # distance in kilometers across a perfect sphere of radius 6370 km

    return 6370*arccos(dot(to_xyz(*a), to_xyz(*b)))

def get_loc(name): result = geocoder.geocode(name) if result.is_success(): return to_polar(*result.get_location()) else: print "Geocoding failed" return (0.0, 0.0)


a = get_loc("Lowry Park Zoo") # spherical polar b = get_loc("MOSI, Tampa, FL")

print dist(a, b) </source>