python - Counting Unique Values of Categories of Column Given Condition on other Column -
i have data frame rows represent transaction done user. note more 1 row can have same user_id. given column names gender , user_id running:
df.gender.value_counts()
returns frequencies spurious since may possibly counting given user more once. example, may tell me there 50 male individuals while less.
is there way can condition value_counts()
count once per user_id?
you want use panda's groupby
on dataframe:
users = {'a': 'male', 'b': 'female', 'c': 'female'} ul = [{'id': k, 'gender': users[k]} _ in range(50) k in random.choice(users.keys())] df = pd.dataframe(ul) print(df.groupby('gender')['id'].nunique())
this yields (depending on fortune's random choice, chances "quite high" each of 3 keys chosen @ least once 50 samples):
gender female 2 male 1 name: id, dtype: int64
Comments
Post a Comment