Skip to main content

Command Palette

Search for a command to run...

A better way to remove duplicate Objects from Array in Javascript

Understanding your tools to get the maximum efficiency.

Updated
4 min read
A better way to remove duplicate Objects from Array in Javascript
V

Figuring out glue to stick lego pieces that don't like each other much :p

Have you ever been in a scenario, where you had an array that contained some complex data(for the sake of our discussion, consider Javascript Object) and you wanted to remove all duplicate entries from that Array.

Well that is relatively easy

const arr = [1, 2, 4, 5, 6, 5, 4, 6, 6, 6, 3];
function removeDuplicates(arr) {
  const result = [];
  arr.forEach(elem => {
    if(!result.includes(elem)) result.push(elem);
  });
  return result;
}

result-one.png

This is great, we have solved our problem. And if you know your language slightly better, you can further simplify the process by using Set.

const result = new Set(arr);

This will give the same result as the algorithm above, with some added benefits that come with Sets(constant lookup time for elements in set being one).

Both these solutions work great with primitive data, but in case of non primitive data, both the above algorithms seems to fail, let's look at why ?

Looking at the code sample below, what do you think will be the output of the last line.

let people = [{name: 'Mike', age: 20}, {name: 'Mike', age: 20}];
console.log(people[0] === people[1]);

The answer is false(I know, you knew but still :p).

This is because Objects in JavaScript are reference types, meaning when you compare two objects instead of comparing Object schema (a.k.a keys), references are compared. And since we are creating new objects, we get different references each time, and thus the answer is false.

Let's try to write a new algorithm for removing objects from array.

function removeDuplicates(arr) {

    const result = [];
    const duplicatesIndices = [];

    // Loop through each item in the original array
    arr.forEach((current, index) => {

        if (duplicatesIndices.includes(index)) return;

        result.push(current);

        // Loop through each other item on array after the current one
        for (let comparisonIndex = index + 1; comparisonIndex < arr.length; comparisonIndex++) {

            const comparison = arr[comparisonIndex];
            const currentKeys = Object.keys(current);
            const comparisonKeys = Object.keys(comparison);

            // Check number of keys in objects
            if (currentKeys.length !== comparisonKeys.length) continue;

            // Check key names
            const currentKeysString = currentKeys.sort().join("").toLowerCase();
            const comparisonKeysString = comparisonKeys.sort().join("").toLowerCase();
            if (currentKeysString !== comparisonKeysString) continue;

            // Check values
            let valuesEqual = true;
            for (let i = 0; i < currentKeys.length; i++) {
                const key = currentKeys[i];
                if ( current[key] !== comparison[key] ) {
                    valuesEqual = false;
                    break;
                }
            }
            if (valuesEqual) duplicatesIndices.push(comparisonIndex);

        } // end for loop

    }); // end arr.forEach()

    return result;
}

Basically what we are doing here is using two nested loops to check each element with all the remaining elements in the array, matching keys of objects, adding all future indexes of the same object in duplicateIndexes and push current elem to result.

let arr = [
  {
    firstName: "Jon",
    lastName: "Doe"
  },
  {
    firstName: "example",
    lastName: "example"
  },
  {
    firstName: "Jon",
    lastName: "Doe"
  }
];

removeDuplicates(arr);

result-two.png

We get the correct result, so the algorithm works perfectly fine, but compared to the previous implementation, this one is significantly slower.

But I think we can do better, let's look at how.

The problem in our original algorithm is, we can't compare Objects like we can compare Numbers and Strings, but what if we could ?

Well here is my little trick for that,

function removeDuplicates(arr) {
  const result = [];
  const lookedAtNodes = new Set();

  // prepass
  let modifiedArr = arr.map(JSON.stringify);

  modifiedArr.forEach((elem, idx) => {
    if(!lookedAtNodes.has(elem)) {
      result.push(arr[idx]);
      lookedAtNodes.add(elem);
    }
  });

  return result;
}

What we are doing here, is trading space to make a faster algorithm, by creating a few intermediate data structures. We create an array to store the JSON.stringifyed string version of objects, a set to store the strings that we have looked at, till at any given point in the functions execution cycle and finally the results array, which we will return at the end as the final result.

final.png

Benchmarks

I did some tests and it turns out this solution is pretty scaleable. Below are some of the results I found.

  • For 100 elements or so, it takes about 5.5ms where as the nested loop approach takes 2.2ms.
  • For 500 elements or so, it takes 5.7ms whereas the nested loop approach takes 11.7ms.
  • For 1500 elements or so, it takes 3.886ms whereas the nested loop approach takes 32.82ms.
  • For 8000 elements or so, it takes 5.57ms whereas the nested loop approach takes 60.71ms.

And after that I was obviously bored, so If someone finds this useful and does some testing on a bigger and may be more real world data, I would love to know stats.

If you want to talk more about the implementation or anything else, you can find me on Instagram or Twitter as @vipulbhj

Thank you so much for reading, don’t forget to share If you find the information useful.

D

Why not just use JSON class to transform it in and out?


const people = [{name: 'Mike', age: 20}, {name: 'Mike', age: 20}];
const dd = [... new Set(people.map(JSON.stringify))].map(JSON.parse)

Simple !

V

Because doing JSON.parse will be expensive in comparison to the current implementation.

Imagine very large JSON Objects, parse is far more expensive than a loop.

I will try to run both the implementations on the big dataset and share the results here.

D

Vipul Bhardwaj

I’m not convinced of that, I would have to see the benchmarks. Loops like forEach are sufficiently taxing themselves don’t forget, as well as the referencing and hoisting of all those references.

V

@thebarefootdev

const data = require('./generated.json'); // 7000 plus rows in the test data.


function algorithmSuggestedInComments(arr) {
  return [... new Set(arr.map(JSON.stringify))].map(JSON.parse);
}

function removeDuplicates(arr) {
  const result = [];
  const lookedAtNodes = new Set();

  // prepass
  const modifiedArr = arr.map(JSON.stringify);

  modifiedArr.forEach((elem, idx) => {
    if(!lookedAtNodes.has(elem)) {
      result.push(arr[idx]);
      lookedAtNodes.add(elem);
    }
  });

  return result;
}

console.time("Algorithm suggested in Comments");
algorithmSuggestedInComments(data);
console.timeEnd("Algorithm suggested in Comments");



console.time("Algorithm mentioned in the article");
removeDuplicates(data);
console.timeEnd("Algorithm mentioned in the article");
> node ./algo.js
Algorithm suggested in Comments: 111.192ms
Algorithm mentioned in the article: 54.276ms
D

Vipul Bhardwaj

I see you are running this in node not browser. What are your HW specs for this (cpu speed ,etc)?

V

@thebarefootdev

I have also made a video of the whole process, you can check it out here.

https://www.dropbox.com/s/0z7m740wxii4p98/Comment%20Algorithm%20Experiment.mov?dl=0

D

Vipul Bhardwaj

I may do, but remember benchmarking is more than console timing 😉

V

@thebarefootdev

Yes, but I wanted to keep it simple and that was the metic used in the benchmarks on the article. Also specs of my system won't matter since I am running both on the same machine.

If you do a more comprehensive comparisons, do let me know.

P.S: Please leave a like if you like the content :p