Dynamic Prog.
Rough notes for Manhattan Problem, Longest common subsequences and similarity.
Dynamic Programming
Weighted interval scheduling : Maximize the total weight of the bookings that are selected. Many problems have an inductive structure but there is the probabilty that problems will be repeated.
Memoization is checking id the value needed has already been computed before.
def fib(n):
if n in fiblist.keys():
return fiblist[n]
if n<=1:
return n
else:
value = fib(n-1) + fib(n-2)
fiblist[n] = value
return value
Solve problems in topological order of dependency, this way wehen we need to evaluate a problem,all the previous subproblems have been already calculated. Memoization is top down approach while Dynamic programming is a bottom up approach.
1. Grid path
Manhattan problem In a 10x5 grid moving from bottom left to top right , youll make 5 rights and 10 ups. Fix position of 5 moves means total paths are 15C5 or 15C10, both same values.
If a certain intersection is blocked, subtract paths from origin to that point and then from that point to destination. If 2,4 is blocked, 6C4 and 9C6 are eliminated. Computing number of possibilities Funnily enough if there is a barrier of holes, dynamic programming will be slower since memoization will not compute the region being unused.
2.Common subwords
Find the longest matching substring given two words.
Common sense one gives O(m*n2) i.e,by comparing every letter in i and j
Instead by using inductions:
LCW(i,j) = 1 + LCW(i+1,j+1)
from bottom right corner, each value is sum of 1 (if matched) and diagonal right
def LCW(u,v):
import numpy as np
(m,n) = (len(u),len(v))
lcw = np.zeroes(m+1,n+1)
maxclw = 0
for c in range(n-1,-1,-1):
for r in range(m-1,-1,-1):
if u[r] == v[c]:
lcw[r,c] = 1+lcw[r+1][c+1]
else:
lcw[r,c] = 0
if lcw[r,c] > maxlcw:
maxlcw = lcw[r,c]
return maxlcw
This is order O(mn)
3.Common subsequence
Subsequence is where words can be dropped. diff command is an example that finds longest common subsequence. It depends on LCS(i+1,j+1), max[L(i,j+1),L(i+1,j)] If i=j, value is 1 + value of(i+1,j+1) else value is max(LCS(i+1,j),LCS(i,j+1))
def LCS(u,v):
import numpy as np
(m,n) = (len(u),len(v))
lcs = np.zeros((m+1,n+1))
for c in range(n-1,-1,-1):
for r in range(m-1,-1,-1):
if u[r] == v[c]:
lcs[r,c] = 1 + lcs[r+1,c+1]
else:
lcs[r,c] = max(lcs[r+1,c],
lcs[r,c+1])
return(lcs[0,0])
4.Document Similarity
Compute the number of insertions,deletions and substitutions. Levenshtein’s distance is minimum number of editing operations needed to transform one document to the other. We realize that deleting and inserting are symmetric, changing abc to xbc involves deleting a and x or deleting a inserting x or deleting x addign a.
To transform u to v: If ai = bj, nothing is to be done If ai not equal to bj we have three options:
- substitute ai by bj
- delete ai
- insert bj before ai ED(i,j) = 1 + min { ED(i+1,j+1),ED(i+1,j),ED(i,j+1) } Base cases: ED(m,n) = 0 ED(i,n) i.e, if second string is empty, them we do m-i deletes on first string ED(m,j) i.e, if second string is empty then do m-j deletes on the second string
Ed(i,j) is value at ED(i+1,j+1) or 1+ min(ED(i+1,j+1),ED(i,j+1),ED(i+1,j)) Moving right is insertion, i.e r and e moving down is deleting
def ED(u,v):
import numpy as np
(m,n) = (len(u),len(v))
ed = np.zeroes((m+1,n+1))
for i in range(m-1,-1,-1):
ed[i,n] = m-i
for j in range(n-1,-1,-1):
ed[m,j] = n-j
for j in range(n-1,-1,-1):
for i in range(m-1,-1,-1):
if u[i] == v[j]:
ed[i][j] = ed[i+1][j+1]
else:
ed[i][j] = 1 + min(ed[i+1][j+1],ed[i,j+1],ed[i+1,j])
return (ed[0][0])
Another algo of time O(mn)
5.Multiplying matrices
A[i][k]*B[k][j]
Overall cost is o(mnp)
where A = mxn and B is nxp
We aren't changing the process just finding an optimal order in which to compute them.
Cost C(0,n-1) = C(0,k-1) + C(k,n-1) + r0rkCn-1
def C(dim):
#dim is a dimension matrix where each value is [(r_i,c_i)..]
import numpy as np
n = dim.shape[0]
C = np.zeros((n,n))
#Primary diagonal is zero
for i in range(n):
C[i][i] = 0
for diff in range(1,n):
for i in range(0,n-diff):
j = i+diff
C[i][j] = C[i][i] + C[i+1][j] + dim[i][0]*dim[i+1][0]*dim[j][i]
for k in range(i+1,j+1):
C[i,j] = min(C[i,j],C[i,k-1]+C[k,j]+dim[i][0]*dim[k][0]*dim[j][i])
return(C[0,n-1])