Wahoo! I need to remember that the inorder traversal of a BST should be
sorted. This piece of trivia comes in handy for a variety of BST related
problems.
I also think being able to do a {pre,in,post}-order traversal recursively and
iteratively is a skill that I need to develop.
Valid Anagram
This one is a classic: `sorted(a) == sorted(b)`
Group Anagrams
Using product of prime numbers to create a key for anagrams is much faster than
sorting the characters in each word. It is also satisfyingly simple.
Encode and Decode Strings
My initial implementation was clumsy and prone to fail for edge-cases. A more
elegant solution is using something like:
```python
def encode(words):
return "".join("{}:{}".format(len(x), x) for x in words)
```
This is tricky because Python has variable-width integers, so relying on two's
complement to support the sum of negative numbers results in infinite
recursion. I know three ways to combat this:
1. Use Java.
2. Conditionally branch and handle either addition or subtraction accordingly.
3. Use a mask to enforce fixed-width integers in Python.
Looks like "Rotate Image" is the only Matrix problem that remains. It was nice
to learn more about "Backtracking" -- a term I often encounter -- while
attempting to solve "Word Search".
From my current understanding, it is like Brute Force but with
short-circuiting. It also seems quite similar to Depth First Search, and I'm
currently unaware of how DFS and Backtracking differ. I'm hoping to learn more
though.
I did these during my flight from LON->NYC without wifi. I managed to get both
correct on the first attempt although I did not find the *optimal* solution for
"Reorder List". IMO "Reorder List" is the best Linked List question I've seen
because it covers a few essential Linked List tricks.
Looks like I should prioritize the following topics:
- Dynamic Programming
- String
- Graph
Although I'm not sure how common DP questions are in interviews, DP is a useful
dragon to slay IMO.
TeamBlind.com hosts a curated list of DS&As questions from LeetCode.com that the
author claims appropriately samples the topics worth exploring. I'm creating an
offline list so that I can track my progress and work while I'm traveling.
I've subtly been implementing breadth-first traversals in graphs
incorrectly. The change is subtle, but updating `seen` needs to happen
immediately after queuing an item.
The results will remain the same, but the runtimes will differ dramatically. I
didn't notice this until I attempted to complete LeetCode's "count islands"
challenge, and LeetCode rejected my solution because it could not finish before
timing out. After looking at other candidates' solutions and comparing them to
mine, I couldn't see any difference... except for this subtle difference.
This SO answer provides a helpful explanation:
https://stackoverflow.com/questions/45623722/marking-node-as-visited-on-bfs-when-dequeuing
The take-away lesson here is to always call `seen.add(..)` immediately after
enqueuing.
The space cost is O(n). The runtime cost of enqueue is O(1); the runtime cost of
dequeue is O(n). Using the "accounting method", the cost of an item in the
system is O(1). Here's why:
+------------+----------------------------+------+
| enqueue | push onto lhs | O(1) |
+------------+----------------------------+------+
| lhs -> rhs | pop off lhs; push onto rhs | O(1) |
+------------+----------------------------+------+
| dequeue | pop off rhs | O(1) |
+------------+----------------------------+------+
The bottom-up solution run in O(n) time instead of O(2^n) time, which the
recursive solution runs as:
```
def fib(n):
return fib(n - 2) + fib(n - 1)
```
Remember that exponential algorithms are usually recursive algorithms with
multiple sibling calls to itself.
Write a predicate for checking if a linked-list contains a cycle. For additional
practice, I also implemented a function that accepts a linked-list containing a
cycle and returns the first element of that cycle.
I believe the previous solution is invalid. This solution works and it should be
more time and space efficient.
Space-wise our stack grows proportionate to the depth of our tree, which for a
"balanced" BST should be log(n). Doing a BFT on a BST results in memory usage of
n because when we encounter the leaf nodes at the final level in the tree, they
will be 1/2 * n for a balanced BST.
InterviewCake.com has a section on Facebook's interview, so I'm attempting to
solve all of the problems on there even if that means I'm resolving
problems. The more practice, the better. Right?
URL: interviewcake.com/facebook-interview-questions
This is the mother of dynamic programming algorithms in my opinion. It computes
the minimal "edit distance" between two input strings where an edit is
considered one of:
- inserting a character into `a`
- deleting a character from `a`
- substituting a character in `a` with a character from `b`
It took me awhile to grok the algorithm, but I implemented this from my
understanding of something that I read ~3 nights prior, so I must've understood
what I read. Good news!
This morning, I attended the "Interview Club" and was asked this question by the
interviewer in front of ~20 FTEs. While I struggled to fully solve it during the
abridged (i.e. 20 minute) timeslot, I completed the problem afterwards.
Here is my solution.
Create a suffix tree from an input string. This implementation uses a stack to
control the flow of the program.
I expected this attempt to be easier than my first attempt, but surprisingly, it
was similarly difficult. It took me ~30-45 minutes to successfully implement
this function, and I'm still not pleased with the final result.
While it took me awhile to implement, this exercise was definitely worth
doing. I think there should be a more elegant way to construct the tree using
maybe a stack, but I couldn't find it.
All of this was part of a larger effort to search a string for a variety of
patterns. The solution is to compile the string into a suffix tree and then
search the suffix tree for each of the patterns.
I'm glad I didn't gloss over this exercise.
Tonight I learned that random sample where each element in the sampling corpus
has an equal likelihood of being chosen is a brand of algorithms known as
"reservoir sampling".
- Implement random.shuffle(..)
- Implement random.choice(..)
Surprisingly, candidates are expected to encounter problems like this during
interviews.
Given an input like "gello" suggest an correction like "hello".
This is a proof-of-concept problem for writing a simplistic auto-correction
algorithm for a mobile device.
This algorithm is pretty interesting because it runs in linear time with respect
to the length of the `corpus` string. It does this by using a sliding window
hash. This hash -- because it's a sliding window -- runs in constant time for
each iteration; we're only adding and subtracting one character each time and
not re-hashing the whole "window".
When our hashes match, only then do we compare the "window" to the
`pattern`. String comparisons are linear because they compare each character to
each character one at a time. But because we only compare strings when are
hashes match (a check which runs in constant time), this spares us the
performance hit.
Firstly, implement a function that adds two arguments together... without using
the `+` operator. I need to drill this problem. Thankfully I took a Coursera
course that taught me how to make a half-adder and a full-adder, but the
recommended solution for this is a bit more difficult.
I was always curious how hashing functions were implemented, so I read about the
"polynomial rolling hash function", and I decided implementing it would be a
good exercise. After writing that, writing a hash table was simple.
Write a function to modify an array of integers in-place such that all of the
zeroes in the array are at the end, and the order of the other integers is not
changed.
This solution operates in O(n) time instead of O(n*log(n)) time, which
surprisingly isn't *that* big of a difference...
Consider a size of n of 10M...
1) ~10s
2) ~0.5s
So, yes, the O(n*log(n)) will take 100x longer to complete, but for an enormous
input size of 10M elements, it can still complete in under a minute. The
difference between that and the second, faster, algorithm, is just 9s.
Write a function that reads a string of compressed XML and outputs the
decompressed version.
Note to self: Now that I'm growing more comfortable writing parsers, I'd like to
become equally comfortable writing pretty-printers.