카테고리로 분류

Notice

Recent Posts

Recent Comments

Link

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

yEverything

카테고리로 분류 본문

Data_analysis/Google_colaboratory

카테고리로 분류

yEvery 2024. 5. 3. 20:20

%matplotlib inline
import pandas as pd

df = pd.read_csv('laptops.csv')
df.head()

laptops.csv 파일 읽기.

요런 데이터들이 들어있음.

brand_nation = {
    'Dell': 'U.S',
    'Apple': 'U.S',
    'Acer': 'Taiwan',
    'HP': 'U.S',
    'Lenovo': 'China',
    'Alienware': 'U.S',
    "Microsoft": 'U.S',
    'Asus': 'Taiwan'
}

brand_nation 딕셔너리 만들어주기.

df['brand'].map(brand_nation)

brand 컬럼에 brand_nation 대응되는 걸로 바꾸도록 map 메서드를 이용.

잘 바뀐 것을 확인할 수 있음.

df['brand_nation'] = df['brand'].map(brand_nation)
df

아예 brand_nation 컬럼 만들어서 거기다가 저장하기.

brand_nation 컬럼이 생성된 것을 확인할 수 있음.

nation_groups = df.groupby('brand_nation')
type(nation_groups)

groupby를 사용해서 brand_nation을 기준으로 하고 nation_groups에 저장하고 해당 타입 알아보기.

타입은 이렇게 나오는 것을 알 수 있음.

nation_groups.count()

brand_nation으로 groupby된 nation_groups에 count 메서드 사용.

맨 왼쪽을 보면 brand_nation 별로 정리된 것을 알 수 있다.

nation_groups.max(numeric_only=True)

최댓값 알아보기.

중국에서 만든 것 중 가장 큰 ram은 8, 대만과 미국은 16이다.

nation_groups.mean(numeric_only=True)

평균값은,

이렇다.

nation_groups.first()

first메서드는 첫 번째 값을 보여준다.

중국에서 만든 것 중 가장 위에 있는 브랜드는 레노버, 대만에서 만든 것 중에서는 Acer, 미국에서 만든 것은 Dell인 것을 알 수 있다.

nation_groups.last()

first와 반대로 가장 마지막 것을 보여준다.

nation_groups.plot(kind='box', y='price')

박스 플롯을 그려서 비교해본다.

중국, 대만, 미국 순. 중국에서 만든 것이 가장 저렴하고 미국에서 만든 것이 전반적으로 가격이 비싼 것을 알 수 있다.

nation_groups.plot(kind='hist', y='price')

이번엔 히스토그램으로 살펴보기.

저작자표시 비영리 변경금지

'Data_analysis > Google_colaboratory' 카테고리의 다른 글

데이터 클리닝-완결성 (0)	2024.05.03
데이터 합치기 (0)	2024.05.03
문자열 필터링 (0)	2024.05.03
문자열 분리 (0)	2024.05.03
새로운 값 계산하기 (0)	2024.05.03

'Data_analysis/Google_colaboratory' Related Articles

yEverything

카테고리로 분류 본문

카테고리로 분류

'Data_analysis > Google_colaboratory' 카테고리의 다른 글

티스토리툴바