python - Sort & Unique vs Set -
in python 2.7, in order retrieve set of unique strings redundant list of strings, preferred (~10 million strings of length ~20):
a) sort list , delete repeating strings
sort(l) unique(l) #some linear time function
b) put them in set
set(l)
note not care order of strings.
i made simple test check running time of both solutions. first test creates set
, second test sorts list (it doesn't remove duplicates sake of simplicity).
as expected creating set faster sorting, since complexity o(n)
while sorting o(nlogn)
.
import random import string import time def random_str(): size = random.randint(10, 20) chars = string.ascii_letters + string.digits return ''.join(random.choice(chars) _ in range(size)) l = [random_str() _ in xrange(1000000)] t1 = time.clock() in range(10): set(l) t2 = time.clock() print(round(t2-t1, 3)) t1 = time.clock() in range(10): sorted(l) t2 = time.clock() print(round(t2-t1, 3))
the output got:
2.77 11.83
Comments
Post a Comment