From Data Structures and Algorithms Class to professional Software Engineering

A university professor who lectures the traditional “Data Structures and Algorithms” class at my engineering school reached out to former students and asked how it turned out. My reply is reproduced here so that more people can benefit from it.

---

In the hiring process, yes, practicing competitive programming and data structures and algorithms problems is the centerpiece.

I must have done nearly a dozen problems similar to LeetCode or InterviewBit in interview settings. Usually one problem per interview, but sometimes two: one simple and then another calibrated to how easily you solved the first.

Concretely, here are some problems I've encountered during interviews:

1. Given a list of integers and a target number T, return the list of all pairs that sum to T. There's an O(n) solution if the map/hashtable is O(1).

2. Given a partially filled Sudoku board with a possible solution, return all possible solutions (starting from the given configuration). One possible solution involves DFS with pruning/backtracking.

3. Let's pretend we're building a build system. Given a list of dependencies: object_1 depends on source_1, source_2; object_2 depends on object_1, etc. (the rule being that objects can depend on a list comprising of other objects and/or sources), give me a compilation order that satisfies the dependencies. The solution would involve building the dependency graph and returning a topological sort.

Having the intuition for complexity analysis without even pausing much serves, I think, as a positive differentiator in the interviewer's perception.

Now, at work, analyses won't be explicitly required. It more often involves avoiding solutions with poor complexity than finding a clever solution with lower complexity. In an engineering environment, other people (and you yourself in the future) have to be able to understand and modify the code, so as a rule the solution that gets implemented is the simplest one that isn't unnecessarily inefficient.

A problem I've seen at work:

We have a list of students A (school system -- “the truth”) and a list of students B (my app's system). We need to determine the difference -- that is, find which students were added (and add them to the app), which students were removed (revoke app access), which students had their information changed (who the guardian is, contact phone number, or something else).

Someone less careful might, for each student in list A, iterate through the entire list B without realizing this would be O(n^2).

Someone who studied complexity and DS&A would know that you just need to sort both lists in O(n log n), then maintain a pointer in each list and advance the pointer in list A, the pointer in list B, or both, in O(n). The difference in wall-clock time, depending on the number of students, would be several orders of magnitude -- say 1 second or 15 minutes. Furthermore, in practice, databases like Postgres or MySQL keep records in some variant of a B-tree, so they're already pre-sorted -- another piece of information you understand naturally if you understand DS&A and databases.

Another common case where junior engineers stumble:

I have a list S and want to remove the elements listed in a list R.

The standard library offers S.Remove(x), where x is an element to be removed.

A DS&A student would know that S is implemented as an array, so every removal involves shifting the remaining elements of the array in O(N). If you iterate through each element of R and call S.Remove(r), where r is the current element of R, you're writing O(n^2) code without realizing it.

(The faster alternative, in terms of complexity, I think, would be to transform R into a hash table/hash set so that checking for an element's existence is O(1), then iterate over S marking each element for removal or not, and finally reconstructing S without the removed elements in O(n).)

So, I admit that at work you'd never implement any data structure (beyond the occasional queue or stack) that isn't already in the standard library. This, I imagine, is the most common student resistance, in the style of “I'll never implement Dijkstra's search algorithm.”

Even though that's valid, the “why” in “why should I learn DS&A” is that any real software system you'll have to work with was written and designed by other people who already understood these concepts well.

If you don't already have that foundation, it will be harder to grasp existing systems, as you will lack vocabulary.

There will be nowhere written down why some of the classical DS&A decisions were made; it's already assumed to be common knowledge among software engineering practitioners.

Thus, knowing data structures and algorithms is considered (in my opinion) the fundamental vocabulary that you will be expected to be proficient in.