Playing with joins in spark

It’s a very common problem across whole data industry, people are struggling to reduce the time complexity, they bought big machines, they moved to spark but yet, there are few problems which can’t be solved by just using something, it requires a deeper insight to literally feel how the framework works and why it’s very slow even with considerable amount of resources.

Here we’ll be discussing how spark treats a join and figure out how to join.

Continue reading “Playing with joins in spark”

Create a website or blog at WordPress.com

Up ↑