How to implement LSH by MapReduce? -
suppose wish implement local sensitive hashing(lsh) mapreduce. specifically, assume chunks of signature matrix consist of columns, , elements key-value pairs key column number , value signature (i.e., vector of values).
(a) show how produce buckets bands output of single mapreduce process. hint: remember map function can produce several key-value pairs single element.
(b) show how mapreduce process can convert output of (a) list of pairs need compared. specifically, each column i, there should list of columns j > needs compared.
(a)
- map: elements , signature input, produce key-value pairs (bucket_id, element)
- reduce: produce buckets bands output, i.e. (bucket_id, list(elements))
map(key, value: element): split item bands band in bands: sig in band: key = hash(sig) // key = bucket id collect(key, value) reduce(key, values): collect(key, values)
(b)
- map: output of (a) input, produce list of combination in same bucket, i.e. (bucket_id, list(elements)) -> (bucket_id, combination(list(elements))), combination() 2 elements chosen same bucket.
- reduce: output item pairs need compared, specifically, each column i, there should list of columns j > needs compared.
map(key, value): itema, itemb in combinations(value) key = (itema.id, itemb.id) collect(key, [itema, itemb]) reduce(key, values): collect(key, values)
Comments
Post a Comment