Comparison of these two algorithms?
Clash Royale CLAN TAG#URR8PPP
up vote
18
down vote
favorite
So I'm presented with a problem that states. "Determine if a string contains all unique characters"
So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.
private static boolean allUniqueCharacters(String s)
Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);
else
return false;
return true;
According to the book I am reading this is the "optimal solution"
public static boolean isUniqueChars2(String str)
if (str.length() > 128)
return false;
boolean char_set = new boolean[128];
for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);
if (char_set[val])
return false;
char_set[val] = true;
return true;
My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?
Thank you.
java algorithm
 |Â
show 11 more comments
up vote
18
down vote
favorite
So I'm presented with a problem that states. "Determine if a string contains all unique characters"
So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.
private static boolean allUniqueCharacters(String s)
Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);
else
return false;
return true;
According to the book I am reading this is the "optimal solution"
public static boolean isUniqueChars2(String str)
if (str.length() > 128)
return false;
boolean char_set = new boolean[128];
for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);
if (char_set[val])
return false;
char_set[val] = true;
return true;
My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?
Thank you.
java algorithm
15
Well, the book "solution" fails in Java because characters are 16 bits and can have a value from0
to65535
, so a 128-byte array won't work.
â Jim Garrison
Aug 21 at 7:43
22
They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
â Amadan
Aug 21 at 7:43
5
Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
â Jilles van Gurp
Aug 21 at 8:54
7
I suggest throwing the book away if the author thinks there are only 128 characters
â phuclv
Aug 21 at 9:33
3
To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
â Nathan Adams
Aug 21 at 9:43
 |Â
show 11 more comments
up vote
18
down vote
favorite
up vote
18
down vote
favorite
So I'm presented with a problem that states. "Determine if a string contains all unique characters"
So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.
private static boolean allUniqueCharacters(String s)
Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);
else
return false;
return true;
According to the book I am reading this is the "optimal solution"
public static boolean isUniqueChars2(String str)
if (str.length() > 128)
return false;
boolean char_set = new boolean[128];
for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);
if (char_set[val])
return false;
char_set[val] = true;
return true;
My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?
Thank you.
java algorithm
So I'm presented with a problem that states. "Determine if a string contains all unique characters"
So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.
private static boolean allUniqueCharacters(String s)
Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);
else
return false;
return true;
According to the book I am reading this is the "optimal solution"
public static boolean isUniqueChars2(String str)
if (str.length() > 128)
return false;
boolean char_set = new boolean[128];
for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);
if (char_set[val])
return false;
char_set[val] = true;
return true;
My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?
Thank you.
java algorithm
java algorithm
edited Aug 21 at 7:54
OldCurmudgeon
50.3k1179159
50.3k1179159
asked Aug 21 at 7:39
fsdff
1759
1759
15
Well, the book "solution" fails in Java because characters are 16 bits and can have a value from0
to65535
, so a 128-byte array won't work.
â Jim Garrison
Aug 21 at 7:43
22
They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
â Amadan
Aug 21 at 7:43
5
Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
â Jilles van Gurp
Aug 21 at 8:54
7
I suggest throwing the book away if the author thinks there are only 128 characters
â phuclv
Aug 21 at 9:33
3
To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
â Nathan Adams
Aug 21 at 9:43
 |Â
show 11 more comments
15
Well, the book "solution" fails in Java because characters are 16 bits and can have a value from0
to65535
, so a 128-byte array won't work.
â Jim Garrison
Aug 21 at 7:43
22
They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
â Amadan
Aug 21 at 7:43
5
Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
â Jilles van Gurp
Aug 21 at 8:54
7
I suggest throwing the book away if the author thinks there are only 128 characters
â phuclv
Aug 21 at 9:33
3
To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
â Nathan Adams
Aug 21 at 9:43
15
15
Well, the book "solution" fails in Java because characters are 16 bits and can have a value from
0
to 65535
, so a 128-byte array won't work.â Jim Garrison
Aug 21 at 7:43
Well, the book "solution" fails in Java because characters are 16 bits and can have a value from
0
to 65535
, so a 128-byte array won't work.â Jim Garrison
Aug 21 at 7:43
22
22
They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
â Amadan
Aug 21 at 7:43
They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
â Amadan
Aug 21 at 7:43
5
5
Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
â Jilles van Gurp
Aug 21 at 8:54
Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
â Jilles van Gurp
Aug 21 at 8:54
7
7
I suggest throwing the book away if the author thinks there are only 128 characters
â phuclv
Aug 21 at 9:33
I suggest throwing the book away if the author thinks there are only 128 characters
â phuclv
Aug 21 at 9:33
3
3
To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
â Nathan Adams
Aug 21 at 9:43
To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
â Nathan Adams
Aug 21 at 9:43
 |Â
show 11 more comments
6 Answers
6
active
oldest
votes
up vote
12
down vote
accepted
As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.
Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.
For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char
values there are. You would also need to remove the check for a string longer than 128.
This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.
add a comment |Â
up vote
3
down vote
Both algorithms have time complexity of O(N). The difference is in their space complexity.
The book's solution will always require storage for 128 characters - O(1)
, while your solution's space requirement will vary linearly according to the input - O(N)
.
The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.
in an interview would my solution be unacceptable or trivial?
â fsdff
Aug 21 at 7:54
@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
â ernest_k
Aug 21 at 7:57
I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
â Yves Daoust
Aug 21 at 8:23
@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
â ernest_k
Aug 21 at 8:30
@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
â Yves Daoust
Aug 21 at 8:33
 |Â
show 2 more comments
up vote
2
down vote
The hashmap is in theory acceptable, but is a waste.
A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.
This adds a lot of overhead in terms of space and time, compared to a straight array.
Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.
As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.
While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs ÃÂ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And IâÂÂm curious how you intend to improve the asymptotic space complexity, which youâÂÂve omitted to explain in your answer.
â Konrad Rudolph
Aug 21 at 13:58
@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
â Yves Daoust
Aug 21 at 14:03
But this isnâÂÂt the hash function being used here, and this isnâÂÂt how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how youâÂÂd implement this algorithm for a Unicode alphabet.
â Konrad Rudolph
Aug 21 at 14:08
@KonradRudolph: the question is about ASCII.
â Yves Daoust
Aug 21 at 14:17
It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
â Konrad Rudolph
Aug 21 at 14:21
 |Â
show 1 more comment
up vote
1
down vote
Your algorithm is also O(1)
. You can think about complexity like how my algorithm will react to the change in amount of elements processed
. Therefore O(n)
and O(2n)
are effectively equal.
People are talking about O notation as growth rate here
Not O(1). Definitely. Unless there is an upper bound on input size.
â user202729
Aug 21 at 9:27
add a comment |Â
up vote
1
down vote
Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.
add a comment |Â
up vote
0
down vote
The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k)
, while the array has a lookup complexity in O(1)
.
This sounds like your algorithm must be much worse. But in fact it is not, as k
is bounded by 128
(else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1)
as well with a bit bigger constants than the array lookup.
*
assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n)
resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.
This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n)
for the converstion, O(log n)
for the lookup) before its lookup time degenerates to O(n)
because of too many collisions.
I moved this discussion into chat.
â allo
Aug 21 at 14:50
add a comment |Â
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
12
down vote
accepted
As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.
Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.
For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char
values there are. You would also need to remove the check for a string longer than 128.
This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.
add a comment |Â
up vote
12
down vote
accepted
As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.
Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.
For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char
values there are. You would also need to remove the check for a string longer than 128.
This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.
add a comment |Â
up vote
12
down vote
accepted
up vote
12
down vote
accepted
As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.
Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.
For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char
values there are. You would also need to remove the check for a string longer than 128.
This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.
As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.
Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.
For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char
values there are. You would also need to remove the check for a string longer than 128.
This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.
answered Aug 21 at 7:58
Sweeper
55.7k962119
55.7k962119
add a comment |Â
add a comment |Â
up vote
3
down vote
Both algorithms have time complexity of O(N). The difference is in their space complexity.
The book's solution will always require storage for 128 characters - O(1)
, while your solution's space requirement will vary linearly according to the input - O(N)
.
The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.
in an interview would my solution be unacceptable or trivial?
â fsdff
Aug 21 at 7:54
@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
â ernest_k
Aug 21 at 7:57
I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
â Yves Daoust
Aug 21 at 8:23
@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
â ernest_k
Aug 21 at 8:30
@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
â Yves Daoust
Aug 21 at 8:33
 |Â
show 2 more comments
up vote
3
down vote
Both algorithms have time complexity of O(N). The difference is in their space complexity.
The book's solution will always require storage for 128 characters - O(1)
, while your solution's space requirement will vary linearly according to the input - O(N)
.
The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.
in an interview would my solution be unacceptable or trivial?
â fsdff
Aug 21 at 7:54
@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
â ernest_k
Aug 21 at 7:57
I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
â Yves Daoust
Aug 21 at 8:23
@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
â ernest_k
Aug 21 at 8:30
@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
â Yves Daoust
Aug 21 at 8:33
 |Â
show 2 more comments
up vote
3
down vote
up vote
3
down vote
Both algorithms have time complexity of O(N). The difference is in their space complexity.
The book's solution will always require storage for 128 characters - O(1)
, while your solution's space requirement will vary linearly according to the input - O(N)
.
The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.
Both algorithms have time complexity of O(N). The difference is in their space complexity.
The book's solution will always require storage for 128 characters - O(1)
, while your solution's space requirement will vary linearly according to the input - O(N)
.
The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.
edited Aug 21 at 8:02
answered Aug 21 at 7:52
ernest_k
14.4k31429
14.4k31429
in an interview would my solution be unacceptable or trivial?
â fsdff
Aug 21 at 7:54
@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
â ernest_k
Aug 21 at 7:57
I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
â Yves Daoust
Aug 21 at 8:23
@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
â ernest_k
Aug 21 at 8:30
@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
â Yves Daoust
Aug 21 at 8:33
 |Â
show 2 more comments
in an interview would my solution be unacceptable or trivial?
â fsdff
Aug 21 at 7:54
@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
â ernest_k
Aug 21 at 7:57
I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
â Yves Daoust
Aug 21 at 8:23
@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
â ernest_k
Aug 21 at 8:30
@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
â Yves Daoust
Aug 21 at 8:33
in an interview would my solution be unacceptable or trivial?
â fsdff
Aug 21 at 7:54
in an interview would my solution be unacceptable or trivial?
â fsdff
Aug 21 at 7:54
@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
â ernest_k
Aug 21 at 7:57
@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
â ernest_k
Aug 21 at 7:57
I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
â Yves Daoust
Aug 21 at 8:23
I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
â Yves Daoust
Aug 21 at 8:23
@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
â ernest_k
Aug 21 at 8:30
@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
â ernest_k
Aug 21 at 8:30
@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
â Yves Daoust
Aug 21 at 8:33
@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
â Yves Daoust
Aug 21 at 8:33
 |Â
show 2 more comments
up vote
2
down vote
The hashmap is in theory acceptable, but is a waste.
A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.
This adds a lot of overhead in terms of space and time, compared to a straight array.
Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.
As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.
While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs ÃÂ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And IâÂÂm curious how you intend to improve the asymptotic space complexity, which youâÂÂve omitted to explain in your answer.
â Konrad Rudolph
Aug 21 at 13:58
@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
â Yves Daoust
Aug 21 at 14:03
But this isnâÂÂt the hash function being used here, and this isnâÂÂt how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how youâÂÂd implement this algorithm for a Unicode alphabet.
â Konrad Rudolph
Aug 21 at 14:08
@KonradRudolph: the question is about ASCII.
â Yves Daoust
Aug 21 at 14:17
It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
â Konrad Rudolph
Aug 21 at 14:21
 |Â
show 1 more comment
up vote
2
down vote
The hashmap is in theory acceptable, but is a waste.
A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.
This adds a lot of overhead in terms of space and time, compared to a straight array.
Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.
As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.
While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs ÃÂ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And IâÂÂm curious how you intend to improve the asymptotic space complexity, which youâÂÂve omitted to explain in your answer.
â Konrad Rudolph
Aug 21 at 13:58
@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
â Yves Daoust
Aug 21 at 14:03
But this isnâÂÂt the hash function being used here, and this isnâÂÂt how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how youâÂÂd implement this algorithm for a Unicode alphabet.
â Konrad Rudolph
Aug 21 at 14:08
@KonradRudolph: the question is about ASCII.
â Yves Daoust
Aug 21 at 14:17
It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
â Konrad Rudolph
Aug 21 at 14:21
 |Â
show 1 more comment
up vote
2
down vote
up vote
2
down vote
The hashmap is in theory acceptable, but is a waste.
A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.
This adds a lot of overhead in terms of space and time, compared to a straight array.
Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.
As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.
The hashmap is in theory acceptable, but is a waste.
A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.
This adds a lot of overhead in terms of space and time, compared to a straight array.
Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.
As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.
edited Aug 21 at 13:34
answered Aug 21 at 8:21
Yves Daoust
35.1k62556
35.1k62556
While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs ÃÂ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And IâÂÂm curious how you intend to improve the asymptotic space complexity, which youâÂÂve omitted to explain in your answer.
â Konrad Rudolph
Aug 21 at 13:58
@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
â Yves Daoust
Aug 21 at 14:03
But this isnâÂÂt the hash function being used here, and this isnâÂÂt how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how youâÂÂd implement this algorithm for a Unicode alphabet.
â Konrad Rudolph
Aug 21 at 14:08
@KonradRudolph: the question is about ASCII.
â Yves Daoust
Aug 21 at 14:17
It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
â Konrad Rudolph
Aug 21 at 14:21
 |Â
show 1 more comment
While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs ÃÂ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And IâÂÂm curious how you intend to improve the asymptotic space complexity, which youâÂÂve omitted to explain in your answer.
â Konrad Rudolph
Aug 21 at 13:58
@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
â Yves Daoust
Aug 21 at 14:03
But this isnâÂÂt the hash function being used here, and this isnâÂÂt how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how youâÂÂd implement this algorithm for a Unicode alphabet.
â Konrad Rudolph
Aug 21 at 14:08
@KonradRudolph: the question is about ASCII.
â Yves Daoust
Aug 21 at 14:17
It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
â Konrad Rudolph
Aug 21 at 14:21
While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs ÃÂ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And IâÂÂm curious how you intend to improve the asymptotic space complexity, which youâÂÂve omitted to explain in your answer.
â Konrad Rudolph
Aug 21 at 13:58
While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs ÃÂ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And IâÂÂm curious how you intend to improve the asymptotic space complexity, which youâÂÂve omitted to explain in your answer.
â Konrad Rudolph
Aug 21 at 13:58
@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
â Yves Daoust
Aug 21 at 14:03
@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
â Yves Daoust
Aug 21 at 14:03
But this isnâÂÂt the hash function being used here, and this isnâÂÂt how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how youâÂÂd implement this algorithm for a Unicode alphabet.
â Konrad Rudolph
Aug 21 at 14:08
But this isnâÂÂt the hash function being used here, and this isnâÂÂt how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how youâÂÂd implement this algorithm for a Unicode alphabet.
â Konrad Rudolph
Aug 21 at 14:08
@KonradRudolph: the question is about ASCII.
â Yves Daoust
Aug 21 at 14:17
@KonradRudolph: the question is about ASCII.
â Yves Daoust
Aug 21 at 14:17
It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
â Konrad Rudolph
Aug 21 at 14:21
It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
â Konrad Rudolph
Aug 21 at 14:21
 |Â
show 1 more comment
up vote
1
down vote
Your algorithm is also O(1)
. You can think about complexity like how my algorithm will react to the change in amount of elements processed
. Therefore O(n)
and O(2n)
are effectively equal.
People are talking about O notation as growth rate here
Not O(1). Definitely. Unless there is an upper bound on input size.
â user202729
Aug 21 at 9:27
add a comment |Â
up vote
1
down vote
Your algorithm is also O(1)
. You can think about complexity like how my algorithm will react to the change in amount of elements processed
. Therefore O(n)
and O(2n)
are effectively equal.
People are talking about O notation as growth rate here
Not O(1). Definitely. Unless there is an upper bound on input size.
â user202729
Aug 21 at 9:27
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Your algorithm is also O(1)
. You can think about complexity like how my algorithm will react to the change in amount of elements processed
. Therefore O(n)
and O(2n)
are effectively equal.
People are talking about O notation as growth rate here
Your algorithm is also O(1)
. You can think about complexity like how my algorithm will react to the change in amount of elements processed
. Therefore O(n)
and O(2n)
are effectively equal.
People are talking about O notation as growth rate here
answered Aug 21 at 7:48
amerykanin
1251212
1251212
Not O(1). Definitely. Unless there is an upper bound on input size.
â user202729
Aug 21 at 9:27
add a comment |Â
Not O(1). Definitely. Unless there is an upper bound on input size.
â user202729
Aug 21 at 9:27
Not O(1). Definitely. Unless there is an upper bound on input size.
â user202729
Aug 21 at 9:27
Not O(1). Definitely. Unless there is an upper bound on input size.
â user202729
Aug 21 at 9:27
add a comment |Â
up vote
1
down vote
Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.
add a comment |Â
up vote
1
down vote
Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.
Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.
answered Aug 21 at 7:52
entpnerd
4,33211641
4,33211641
add a comment |Â
add a comment |Â
up vote
0
down vote
The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k)
, while the array has a lookup complexity in O(1)
.
This sounds like your algorithm must be much worse. But in fact it is not, as k
is bounded by 128
(else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1)
as well with a bit bigger constants than the array lookup.
*
assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n)
resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.
This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n)
for the converstion, O(log n)
for the lookup) before its lookup time degenerates to O(n)
because of too many collisions.
I moved this discussion into chat.
â allo
Aug 21 at 14:50
add a comment |Â
up vote
0
down vote
The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k)
, while the array has a lookup complexity in O(1)
.
This sounds like your algorithm must be much worse. But in fact it is not, as k
is bounded by 128
(else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1)
as well with a bit bigger constants than the array lookup.
*
assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n)
resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.
This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n)
for the converstion, O(log n)
for the lookup) before its lookup time degenerates to O(n)
because of too many collisions.
I moved this discussion into chat.
â allo
Aug 21 at 14:50
add a comment |Â
up vote
0
down vote
up vote
0
down vote
The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k)
, while the array has a lookup complexity in O(1)
.
This sounds like your algorithm must be much worse. But in fact it is not, as k
is bounded by 128
(else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1)
as well with a bit bigger constants than the array lookup.
*
assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n)
resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.
This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n)
for the converstion, O(log n)
for the lookup) before its lookup time degenerates to O(n)
because of too many collisions.
The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k)
, while the array has a lookup complexity in O(1)
.
This sounds like your algorithm must be much worse. But in fact it is not, as k
is bounded by 128
(else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1)
as well with a bit bigger constants than the array lookup.
*
assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n)
resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.
This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n)
for the converstion, O(log n)
for the lookup) before its lookup time degenerates to O(n)
because of too many collisions.
edited Aug 21 at 14:39
answered Aug 21 at 12:59
allo
1,63321728
1,63321728
I moved this discussion into chat.
â allo
Aug 21 at 14:50
add a comment |Â
I moved this discussion into chat.
â allo
Aug 21 at 14:50
I moved this discussion into chat.
â allo
Aug 21 at 14:50
I moved this discussion into chat.
â allo
Aug 21 at 14:50
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51943836%2fcomparison-of-these-two-algorithms%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
15
Well, the book "solution" fails in Java because characters are 16 bits and can have a value from
0
to65535
, so a 128-byte array won't work.â Jim Garrison
Aug 21 at 7:43
22
They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
â Amadan
Aug 21 at 7:43
5
Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
â Jilles van Gurp
Aug 21 at 8:54
7
I suggest throwing the book away if the author thinks there are only 128 characters
â phuclv
Aug 21 at 9:33
3
To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
â Nathan Adams
Aug 21 at 9:43