This is the mother of dynamic programming algorithms in my opinion. It computes
the minimal "edit distance" between two input strings where an edit is
considered one of:
- inserting a character into `a`
- deleting a character from `a`
- substituting a character in `a` with a character from `b`
It took me awhile to grok the algorithm, but I implemented this from my
understanding of something that I read ~3 nights prior, so I must've understood
what I read. Good news!
This morning, I attended the "Interview Club" and was asked this question by the
interviewer in front of ~20 FTEs. While I struggled to fully solve it during the
abridged (i.e. 20 minute) timeslot, I completed the problem afterwards.
Here is my solution.
Create a suffix tree from an input string. This implementation uses a stack to
control the flow of the program.
I expected this attempt to be easier than my first attempt, but surprisingly, it
was similarly difficult. It took me ~30-45 minutes to successfully implement
this function, and I'm still not pleased with the final result.
While it took me awhile to implement, this exercise was definitely worth
doing. I think there should be a more elegant way to construct the tree using
maybe a stack, but I couldn't find it.
All of this was part of a larger effort to search a string for a variety of
patterns. The solution is to compile the string into a suffix tree and then
search the suffix tree for each of the patterns.
I'm glad I didn't gloss over this exercise.
Tonight I learned that random sample where each element in the sampling corpus
has an equal likelihood of being chosen is a brand of algorithms known as
"reservoir sampling".
- Implement random.shuffle(..)
- Implement random.choice(..)
Surprisingly, candidates are expected to encounter problems like this during
interviews.
Given an input like "gello" suggest an correction like "hello".
This is a proof-of-concept problem for writing a simplistic auto-correction
algorithm for a mobile device.
This algorithm is pretty interesting because it runs in linear time with respect
to the length of the `corpus` string. It does this by using a sliding window
hash. This hash -- because it's a sliding window -- runs in constant time for
each iteration; we're only adding and subtracting one character each time and
not re-hashing the whole "window".
When our hashes match, only then do we compare the "window" to the
`pattern`. String comparisons are linear because they compare each character to
each character one at a time. But because we only compare strings when are
hashes match (a check which runs in constant time), this spares us the
performance hit.
Firstly, implement a function that adds two arguments together... without using
the `+` operator. I need to drill this problem. Thankfully I took a Coursera
course that taught me how to make a half-adder and a full-adder, but the
recommended solution for this is a bit more difficult.
I was always curious how hashing functions were implemented, so I read about the
"polynomial rolling hash function", and I decided implementing it would be a
good exercise. After writing that, writing a hash table was simple.
Write a function to modify an array of integers in-place such that all of the
zeroes in the array are at the end, and the order of the other integers is not
changed.
This solution operates in O(n) time instead of O(n*log(n)) time, which
surprisingly isn't *that* big of a difference...
Consider a size of n of 10M...
1) ~10s
2) ~0.5s
So, yes, the O(n*log(n)) will take 100x longer to complete, but for an enormous
input size of 10M elements, it can still complete in under a minute. The
difference between that and the second, faster, algorithm, is just 9s.
Write a function that reads a string of compressed XML and outputs the
decompressed version.
Note to self: Now that I'm growing more comfortable writing parsers, I'd like to
become equally comfortable writing pretty-printers.
After a five year hiatus, I decided to attempt to solve the famous N queens
problem again. This time, instead of modeling the chess board using a
`[[Bool]]`, I'm using `[Integer]` where the `Integer` indicates which column has
a queen. This is a bit lighter in RAM.