python - Counting Unique Values of Categories of Column Given Condition on other Column -


i have data frame rows represent transaction done user. note more 1 row can have same user_id. given column names gender , user_id running:

df.gender.value_counts() 

returns frequencies spurious since may possibly counting given user more once. example, may tell me there 50 male individuals while less.

is there way can condition value_counts() count once per user_id?

you want use panda's groupby on dataframe:

users = {'a': 'male', 'b': 'female', 'c': 'female'} ul = [{'id': k, 'gender': users[k]} _ in range(50) k in random.choice(users.keys())] df = pd.dataframe(ul)  print(df.groupby('gender')['id'].nunique()) 

this yields (depending on fortune's random choice, chances "quite high" each of 3 keys chosen @ least once 50 samples):

gender female    2 male      1 name: id, dtype: int64 

Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -