python - Sort & Unique vs Set -
in python 2.7, in order retrieve set of unique strings redundant list of strings, preferred (~10 million strings of length ~20):
a) sort list , delete repeating strings
sort(l) unique(l) #some linear time function b) put them in set
set(l) note not care order of strings.
i made simple test check running time of both solutions. first test creates set , second test sorts list (it doesn't remove duplicates sake of simplicity).
as expected creating set faster sorting, since complexity o(n) while sorting o(nlogn).
import random import string import time def random_str(): size = random.randint(10, 20) chars = string.ascii_letters + string.digits return ''.join(random.choice(chars) _ in range(size)) l = [random_str() _ in xrange(1000000)] t1 = time.clock() in range(10): set(l) t2 = time.clock() print(round(t2-t1, 3)) t1 = time.clock() in range(10): sorted(l) t2 = time.clock() print(round(t2-t1, 3)) the output got:
2.77 11.83
Comments
Post a Comment