Top 0.6% solution to LeetCode's Problem 1
Problem 1 should be among the most attempted competitive programming problems, as I assume most people's journey on LeetCode start by solving it.
After submitting an Accepted solution, LeetCode also provides a distribution of the runtimes achieved by other contestants, which got me wondering whether by now I knew enough about algorithms, low-level C programming and x86 processor performance characteristics to come up a with a solution among the topmost percentiles.
Version 1: Top 33%
To get our feet wet, here's a correct solution by following the most naive approach of enumerating every possible pair:
// V1: runtime 104 ms, 108 ms #define F(I,Z,N) for(int(I)=(Z);(I)<(N);(I)++) int result[2]; int* twoSum(int* nums, int numsSize, int target, int* returnSize) { int* ns = nums; int N = numsSize; F(i,0,N-1) F(j,i+1,N) { int a = ns[i]; int b = ns[j]; if (a + b == target) { result[0] = i; result[1] = j; *returnSize = 2; return &result[0]; } } return NULL; }
No big surprises, O(N^2)
is slow, but we must get it correct before getting it fast. This makes sure I understood the problem correctly.
Version 2: Top 8%
Now, a two-pointer approach, after sorting the input array. This is O(N log N)
, as even though the two-pointer scan is O(N)
, the overall time complexity is dictated by the quicksort performed at the start.
// V2: runtime 10 ms #define F(I,Z,N) for(int(I)=(Z);(I)<(N);(I)++) int result[2]; typedef struct { int i; int x; } pair; pair vs[10001]; int cmp(const void* a, const void* b) { int l = ((pair*)a)->x; int r = ((pair*)b)->x; if (r == l) { return 0; } return (l < r) ? -1 : 1; } int* twoSum(int* nums, int numsSize, int target, int* returnSize) { int* ns = nums; int N = numsSize; F(i,0,N) { vs[i].i = i; vs[i].x = ns[i]; } qsort(vs, N, sizeof(pair), cmp); int i = 0; int j = N-1; while (i < j) { int l = vs[i].x; int r = vs[j].x; int s = l + r; if (s == target) { result[0] = vs[i].i; result[1] = vs[j].i; *returnSize = 2; return &result[0]; } if (s < target) { i++; } else { j--; } } *returnSize = 0; return NULL; }
Version 3: Top 0.6%
Finally, this is the best solution I could come up with, after accepting the fact that a hashing approach would be needed -- which is cumbersome in C, as neither the language nor the standard library provide a hash table.
At first I thought that the modulus operator would be slow, but from results the benefit of using the modulus of a prime number to better disperse the entries is better than using a bitwise mask -- that, while faster to compute, is more prone to collisions.
The prime number chosen for the table size is the smallest prime which can store 2x the maximum number of entries (to keep a worst-case 50% hash table occupancy).
Also, apparently, a 64-bit modulus (using int64_t
) is faster than the 32-bit modulus (using int
) for the LeetCode machines, which is surprising to me.
Anyway, here's the code:
// V3: runtime 3 ms, 4 ms, 3 ms #define F(I,Z,N) for(int(I)=(Z);(I)<(N);(I)++) #define TABSIZE 20011L typedef int64_t i64; typedef struct { int k; int v; } entry; entry table[TABSIZE]; void table_init() { memset(table, 0, sizeof(table)); } void table_add(i64 n, int v) { int idx = ((n + 1000000001L) % TABSIZE); while (table[idx].v != 0) { idx++; if (idx == TABSIZE) { idx = 0; } } table[idx].k = n; table[idx].v = v+1; } int table_get(i64 n) { if (n <= -1000000001 || n >= 1000000001) { return -1; } int idx = ((n + 1000000001L) % TABSIZE); while (table[idx].v != 0) { if (table[idx].k == n) { return table[idx].v-1; } idx++; if (idx == TABSIZE) { idx = 0; } } return -1; } #define MIN(i,j) (((i)<(j))?(i):(j)) #define MAX(i,j) (((i)>(j))?(i):(j)) int result[2]; int* twoSum(int* nums, int numsSize, int target, int* returnSize) { table_init(); int* ns = nums; int N = numsSize; F(i,0,N) { int c = target - ns[i]; int j = table_get(c); if (j == -1) { table_add(ns[i], i); } else { result[0] = MIN(i, j); result[1] = MAX(i, j); *returnSize = 2; return &result[0]; } } assert(0); return NULL; }
The approach above should be O(n)
in the typical case, although well-crafted input can make it run closer to O(n^2)
if the hash table degrades into linear probing too often.
This gets a runtime of 3ms, which beats 99.38% of submissions. The assert(0)
at the end is to make the Runtime Error message clearer, in case I hit it -- a code path I'm assuming it's impossible to hit.