python - Sort & Unique vs Set -


in python 2.7, in order retrieve set of unique strings redundant list of strings, preferred (~10 million strings of length ~20):

a) sort list , delete repeating strings

sort(l) unique(l) #some linear time function 

b) put them in set

set(l) 

note not care order of strings.

i made simple test check running time of both solutions. first test creates set , second test sorts list (it doesn't remove duplicates sake of simplicity).

as expected creating set faster sorting, since complexity o(n) while sorting o(nlogn).

import random import string import time   def random_str():     size = random.randint(10, 20)     chars = string.ascii_letters + string.digits     return ''.join(random.choice(chars) _ in range(size))   l = [random_str() _ in xrange(1000000)]  t1 = time.clock() in range(10):     set(l) t2 = time.clock() print(round(t2-t1, 3))  t1 = time.clock() in range(10):     sorted(l) t2 = time.clock() print(round(t2-t1, 3)) 

the output got:

2.77 11.83 

Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -