<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Recording Study Log</title>
    <link>https://junecho.tistory.com/</link>
    <description>개인저장용</description>
    <language>ko</language>
    <pubDate>Tue, 12 May 2026 14:56:40 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>junecho</managingEditor>
    <image>
      <title>Recording Study Log</title>
      <url>https://tistory1.daumcdn.net/tistory/8190269/attach/f30f8c3ad33a46c3a9566ef6f7221f93</url>
      <link>https://junecho.tistory.com</link>
    </image>
    <item>
      <title>[251027] 66일차</title>
      <link>https://junecho.tistory.com/74</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;코드카타 2문제 풀고 100번에 진입함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;쉬운 문제였어서 코드는 안 올림&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실은 이거 올리고 싶어서 글씀&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;846&quot; data-origin-height=&quot;163&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ZLehN/dJMcadAhwPx/GOnK3xZGQZngdHV4PlDtX0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ZLehN/dJMcadAhwPx/GOnK3xZGQZngdHV4PlDtX0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ZLehN/dJMcadAhwPx/GOnK3xZGQZngdHV4PlDtX0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FZLehN%2FdJMcadAhwPx%2FGOnK3xZGQZngdHV4PlDtX0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;846&quot; height=&quot;163&quot; data-origin-width=&quot;846&quot; data-origin-height=&quot;163&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우하핫 PPT 처음 보여드릴 때도 엄청 칭찬하셨었는데, 서면 피드백에서도 또 칭찬 받았다^ㅡ^v&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;전처리 메인으로 2차 플젝 때 처음 해본건데 정말&amp;hellip; 전처리란 섬세하고 민감한 사람만이 잘 할 수 있는것 같음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;나같이 응대충햐~ 하는 사람이 하기에는 너무나도 센시티브한 녀석임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;태블로랑 크롤링 세션은 &amp;hellip;어떻게 글로 정리해야 할 지 감이 안와서 정리를 못하겠음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;~대충 감으로 익히는중~&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아 크롤링 과제하다가 엄청 헤맸던 거 올려놔야겠다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;disabled OPEN_MAP_AND_LOCAL service&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;API 요청 실패: 401 - {&quot;errorType&quot;:&quot;AccessDeniedError&quot;,&quot;message&quot;:&quot;KA Header is required but neither os nor origin field is given&amp;rdquo;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;계속 뜨던 에러들&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;칷&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;DOG같은 카카오 API 인증 이것 때문에 1시간 넘게 허비했음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://developers.kakao.com/console/app&quot;&gt;https://developers.kakao.com/console/app&lt;/a&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 앱 &amp;rarr; 앱 설정 &amp;rarr; 앱 &amp;rarr; 추가 기능 신청 &amp;rarr; 카카오맵 &amp;rarr; 신청 &amp;rarr; 상태 ON&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1208&quot; data-origin-height=&quot;470&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bdW0LU/dJMcagDL2Db/ibApmM38oPovNLrR4Mk7xk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bdW0LU/dJMcagDL2Db/ibApmM38oPovNLrR4Mk7xk/img.png&quot; data-alt=&quot;\&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bdW0LU/dJMcagDL2Db/ibApmM38oPovNLrR4Mk7xk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbdW0LU%2FdJMcagDL2Db%2FibApmM38oPovNLrR4Mk7xk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1208&quot; height=&quot;470&quot; data-origin-width=&quot;1208&quot; data-origin-height=&quot;470&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;\&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;589&quot; data-origin-height=&quot;582&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/8MAOv/dJMcagDL2Dh/em6hUTkCkKSKiPMDg9AyI1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/8MAOv/dJMcagDL2Dh/em6hUTkCkKSKiPMDg9AyI1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/8MAOv/dJMcagDL2Dh/em6hUTkCkKSKiPMDg9AyI1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F8MAOv%2FdJMcagDL2Dh%2Fem6hUTkCkKSKiPMDg9AyI1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;589&quot; height=&quot;582&quot; data-origin-width=&quot;589&quot; data-origin-height=&quot;582&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>Sparta/etc</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/74</guid>
      <comments>https://junecho.tistory.com/74#entry74comment</comments>
      <pubDate>Mon, 27 Oct 2025 20:03:33 +0900</pubDate>
    </item>
    <item>
      <title>[251024] QCC</title>
      <link>https://junecho.tistory.com/73</link>
      <description>&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #99cefa;&quot;&gt;&lt;b&gt; &amp;nbsp; CODEKATA&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;정말 간만의 TIL 이라고 쓰고 그냥 QCC 기록용&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;와&amp;hellip;SQL 한동안 안했더니 기억이 잘 안났음 어떡하냐&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;코드카타 할 시간은 없는데 팀프로젝트는 해야되고 자격증 시험 공부도 해야되고 아~~~&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;바쁘다 바빠 현대인의 삶&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;현대인 말고 원시인하면 안될까요&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;+&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;팀장이 되어버렸음&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&amp;hellip;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;OMG.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;내가 팀장하면 독재자 된다고 했는데 때론 그것도 필요하대 ㅠㅠㅠ&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;일정 관리 못한다니까 일정 관리 서포터 해준다고 팀장하래 ~~~~&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ㅠㅠ&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;근데 다른 사람들은 내가 팀장 됐다고 하니까 다 좋아한다 나쁜 사람들&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1.&lt;/p&gt;
&lt;pre class=&quot;n1ql&quot;&gt;&lt;code&gt;SELECT COUNT(gnp - gnpold) AS country_count
FROM country
WHERE 
  gnpold IS NOT NULL AND gnpold != 0 AND
  population &amp;gt;= 10000000 AND (gnp - gnpold) &amp;lt; 0
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2.&lt;/p&gt;
&lt;pre class=&quot;pgsql&quot;&gt;&lt;code&gt;SELECT district, ROUND(AVG(population)) AS average_population
FROM city
GROUP BY district
HAVING COUNT(name) &amp;gt;= 3
ORDER BY ROUND(AVG(population)) DESC
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2번 문제 ORDER BY 해도 뭔가 정렬이 이상하게 보였음 왜지&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;처음엔 ORDER BY average_population 이렇게 했다가 정렬 이상해서&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;보험으로 ROUND(AVG(&lt;span style=&quot;background-color: #fafafa; color: #333333; text-align: start;&quot;&gt;population&lt;/span&gt;)) 라곤 했는데 보이는 결과는 똑같았음...&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ROUND가 아니라 다른걸 썼어야 했나.....?&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3.&lt;/p&gt;
&lt;pre class=&quot;pgsql&quot;&gt;&lt;code&gt;WITH cond AS (
  SELECT cr.continent AS continent, MAX(ct.population) AS max_population
  FROM country cr LEFT JOIN city ct on cr.code = ct.countrycode
  WHERE ct.name IS NOT NULL
  GROUP BY cr.continent
)

SELECT ct.name AS city_name, cr.name AS country_name, cr.continent, cd.max_population AS population
FROM 
  country cr JOIN city ct ON cr.code = ct.countrycode 
  JOIN cond cd ON cr.continent = cd.continent 
  AND ct.population = cd.max_population
ORDER BY population DESC
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;처음엔 WITH 없이 그냥 MAX로 뽑았다가 MAX population이랑 city랑 맞지 않는걸 보고&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;WITH로 MAX population해서 city찾고 본문에서는 JOIN 시켜버림&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;튜터님 코드&lt;/p&gt;
&lt;pre id=&quot;code_1761299398824&quot; class=&quot;sql&quot; data-ke-language=&quot;sql&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;SELECT CityName AS city_name, CountryName AS country_name, Continent AS continent_name, Population AS population
FROM (
    SELECT
        c.Name AS CityName, co.Name AS CountryName, co.Continent, c.Population,
        ROW_NUMBER() OVER (PARTITION BY co.Continent ORDER BY c.Population DESC) AS PopulationRank
    FROM qcc.country co
    JOIN qcc.city c ON c.CountryCode = co.Code
) ranked_cities
WHERE PopulationRank = 1
ORDER BY Population DESC&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Sparta/CODEKATA</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/73</guid>
      <comments>https://junecho.tistory.com/73#entry73comment</comments>
      <pubDate>Fri, 24 Oct 2025 16:06:45 +0900</pubDate>
    </item>
    <item>
      <title>[251002] 통계검정 실습 01 - t-test</title>
      <link>https://junecho.tistory.com/72</link>
      <description>&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;✅ t-test&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;두 그룹 간 평균의 차이가 통계적으로 유의미한지를 검정하는 방법&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실제로 차이가 있는지, 우연인지 판단&lt;br /&gt;&lt;/b&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  이론&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  수행 단계&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 표본 크기에 따른 정규성 확인&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;n &amp;lt; 30 : Shapiro-Wilk test 필수 &amp;rArr; p &amp;gt; 0.05 일시 t-test&lt;/li&gt;
&lt;li&gt;30 &amp;le; n &amp;lt; 100 : 왜도/첨도 확인 &amp;rArr; 왜도&amp;lt;1, 첨도&amp;lt;2 일시 t-test&lt;/li&gt;
&lt;li&gt;n &amp;ge; 100 : 중심극한정리 적용 &amp;rArr; 왜도 &amp;lt; 2 일시 t-test&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 정규성 검정 stats.shapiro()&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;p &amp;gt; 0.05&lt;/b&gt; : 정규분포 &amp;rArr; &lt;b&gt;t-test&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;p &amp;le; 0.05 : 비정규분포 &amp;rArr; 비모수검정&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 등분산성 검정 (독립표본만) stats.levene&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Levene's test p &amp;gt; 0.05 &amp;rArr; equal_var=True (Student&amp;rsquo;s t-test)&lt;/li&gt;
&lt;li&gt;Levene's test p &amp;le; 0.05 &amp;rArr; equal_var=False (Welch's t-test)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 결과해석&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;p-value &amp;lt; 0.05 &amp;rArr; 유의한 차이 있음&lt;/li&gt;
&lt;li&gt;Cohen's d로 효과 크기 확인 (0.2: 작음, 0.5: 중간, 0.8: 큼)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  종류&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 80px;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot; data-ke-style=&quot;style12&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 19.6512%; height: 20px;&quot;&gt;&lt;b&gt;종류&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 27.9069%; height: 20px;&quot;&gt;&lt;b&gt;사용처&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 18.0233%; height: 20px;&quot;&gt;&lt;b&gt;함수&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 34.4186%; height: 20px;&quot;&gt;&lt;b&gt;예시&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 19.6512%; height: 20px;&quot;&gt;&lt;b&gt;독립표본 t-test&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 27.9069%; height: 20px;&quot;&gt;두 독립 그룹 비교&lt;/td&gt;
&lt;td style=&quot;width: 18.0233%; height: 20px;&quot;&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt; &lt;b&gt;ttest_ind()&lt;/b&gt; &lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;width: 34.4186%; height: 20px;&quot;&gt;와인 종류별 알코올 도수&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 19.6512%; height: 20px;&quot;&gt;&lt;b&gt;대응표본 t-test&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 27.9069%; height: 20px;&quot;&gt;같은 대상의 전후 비교&lt;/td&gt;
&lt;td style=&quot;width: 18.0233%; height: 20px;&quot;&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt; &lt;b&gt;ttest_rel()&lt;/b&gt; &lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;width: 34.4186%; height: 20px;&quot;&gt;치료 전후 혈당 수치&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 19.6512%; height: 20px;&quot;&gt;&lt;b&gt;단일표본 t-test&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 27.9069%; height: 20px;&quot;&gt;한 그룹과 기준값 비교&lt;/td&gt;
&lt;td style=&quot;width: 18.0233%; height: 20px;&quot;&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt; &lt;b&gt;ttest_1samp()&lt;/b&gt; &lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;width: 34.4186%; height: 20px;&quot;&gt;평균 알코올 도수 13도인지?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  정규성 확인을 위한 Q-Q Plot 이해와 해석&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Quantile-Quantile Plot&lt;/b&gt; : 데이터가 정규분포를 따르는지 시각적으로 확인하는 도구&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;X축 : 이론적 정규분포 분위수 (정규분포 기준)&lt;/li&gt;
&lt;li&gt;Y축 : 실제 데이터 분위수&lt;/li&gt;
&lt;li&gt;핵심 : 데이터가 정규분포를 따르면 점들이 직선에 가깝게 배열&lt;/li&gt;
&lt;li&gt;왜도 (Skewness)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 데이터 분포의 비대칭 정도를 나타내는 통계량&lt;/li&gt;
&lt;li&gt;0에 가까울 수록 분포가 대칭적(정규분포)&lt;/li&gt;
&lt;li&gt;&amp;cap; 모양 &amp;rarr; Left-skew 왼쪽 긴 꼬리&lt;/li&gt;
&lt;li&gt;&amp;cup; 모양 &amp;rarr; Right-skew 오른쪽 긴 꼬리&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;첨도 (Kurtosis)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 데이터 분포의 꼬리 두께와 중심부의 뾰족함 정도를 나타내는 통계량&lt;/li&gt;
&lt;li&gt;정규 분포의 첨도 : 일반적으로 3.&lt;/li&gt;
&lt;li&gt;양 끝 &amp;darr; &amp;rarr; Light Tailed (극단값 &amp;darr;)&lt;/li&gt;
&lt;li&gt;양 끝 &amp;uarr; &amp;rarr; Heavy Tailed (극단값 &amp;uarr;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 122px;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot; data-ke-style=&quot;style15&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 25.3488%; height: 17px;&quot;&gt;&lt;b&gt; &lt;span data-token-index=&quot;0&quot;&gt;패턴명&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 18.9535%; height: 17px;&quot;&gt;&lt;b&gt; &lt;span data-token-index=&quot;0&quot;&gt;Q-Q Plot 모양&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 26.6279%; height: 17px;&quot;&gt;&lt;b&gt; &lt;span data-token-index=&quot;0&quot;&gt;의미&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 29.0698%; height: 17px;&quot;&gt;&lt;b&gt; &lt;span data-token-index=&quot;0&quot;&gt;대응방법&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 25.3488%; height: 17px;&quot;&gt;Normal&lt;/td&gt;
&lt;td style=&quot;width: 18.9535%; height: 17px;&quot;&gt;직선&lt;/td&gt;
&lt;td style=&quot;width: 26.6279%; height: 17px;&quot;&gt;정규분포&lt;/td&gt;
&lt;td style=&quot;width: 29.0698%; height: 17px;&quot;&gt;t-test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 25.3488%; height: 20px;&quot;&gt;Light Tail 가벼운 꼬리&lt;/td&gt;
&lt;td style=&quot;width: 18.9535%; height: 20px;&quot;&gt;S자&lt;/td&gt;
&lt;td style=&quot;width: 26.6279%; height: 20px;&quot;&gt;극단값 &amp;darr;&lt;/td&gt;
&lt;td style=&quot;width: 29.0698%; height: 20px;&quot;&gt;큰 문제 ❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 25.3488%; height: 17px;&quot;&gt;Heavy Tail 무거운 꼬리&lt;/td&gt;
&lt;td style=&quot;width: 18.9535%; height: 17px;&quot;&gt;역S자&lt;/td&gt;
&lt;td style=&quot;width: 26.6279%; height: 17px;&quot;&gt;극단값 &amp;uarr;&lt;/td&gt;
&lt;td style=&quot;width: 29.0698%; height: 17px;&quot;&gt;비모수 검정 고려&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 25.3488%; height: 17px;&quot;&gt;Left-skew&lt;/td&gt;
&lt;td style=&quot;width: 18.9535%; height: 17px;&quot;&gt;&amp;cap; 모양&lt;/td&gt;
&lt;td style=&quot;width: 26.6279%; height: 17px;&quot;&gt;음의 왜도&lt;/td&gt;
&lt;td style=&quot;width: 29.0698%; height: 17px;&quot;&gt;제곱 변환 고려&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 25.3488%; height: 17px;&quot;&gt;Right-skew&lt;/td&gt;
&lt;td style=&quot;width: 18.9535%; height: 17px;&quot;&gt;&amp;cup; 모양&lt;/td&gt;
&lt;td style=&quot;width: 26.6279%; height: 17px;&quot;&gt;양의 왜도&lt;/td&gt;
&lt;td style=&quot;width: 29.0698%; height: 17px;&quot;&gt;로그 변환 고려&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 25.3488%; height: 17px;&quot;&gt;Bimodal&lt;/td&gt;
&lt;td style=&quot;width: 18.9535%; height: 17px;&quot;&gt;계단 모양&lt;/td&gt;
&lt;td style=&quot;width: 26.6279%; height: 17px;&quot;&gt;이산형 데이터&lt;/td&gt;
&lt;td style=&quot;width: 29.0698%; height: 17px;&quot;&gt;비모수 검정 권장&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1789&quot; data-origin-height=&quot;1210&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/PUdZa/btsQ3cBAexn/0bkDEINv5JdeyHmpsyWjd1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/PUdZa/btsQ3cBAexn/0bkDEINv5JdeyHmpsyWjd1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/PUdZa/btsQ3cBAexn/0bkDEINv5JdeyHmpsyWjd1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FPUdZa%2FbtsQ3cBAexn%2F0bkDEINv5JdeyHmpsyWjd1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1789&quot; height=&quot;1210&quot; data-origin-width=&quot;1789&quot; data-origin-height=&quot;1210&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;811&quot; data-origin-height=&quot;966&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/deP22d/btsQ3x6yU2J/yM5fr0QvLWgaU0Xh5b0SbK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/deP22d/btsQ3x6yU2J/yM5fr0QvLWgaU0Xh5b0SbK/img.png&quot; data-alt=&quot;https://jtr13.github.io/EDAVold/qqplot.html&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/deP22d/btsQ3x6yU2J/yM5fr0QvLWgaU0Xh5b0SbK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdeP22d%2FbtsQ3x6yU2J%2FyM5fr0QvLWgaU0Xh5b0SbK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;811&quot; height=&quot;966&quot; data-origin-width=&quot;811&quot; data-origin-height=&quot;966&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;https://jtr13.github.io/EDAVold/qqplot.html&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  p-value   Cohen&amp;rsquo;s d&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot; data-ke-style=&quot;style12&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 13.7984%;&quot;&gt;&lt;b&gt; &lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;구분&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;&lt;b&gt; &lt;span data-token-index=&quot;0&quot;&gt;p-value (통계적 유의성)&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 45.5426%;&quot;&gt;&lt;b&gt; &lt;span data-token-index=&quot;0&quot;&gt;Cohen&amp;rsquo;s d (실질적 유의성)&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 13.7984%;&quot;&gt;&lt;b&gt;핵심 질문&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;&amp;ldquo;이 결과가 우연인가?&amp;rdquo;&lt;/td&gt;
&lt;td style=&quot;width: 45.5426%;&quot;&gt;&amp;ldquo;이 차이가 실무적으로 중요한가?&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 13.7984%;&quot;&gt;&lt;b&gt;특징&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;표본 크기에 민감&lt;/td&gt;
&lt;td style=&quot;width: 45.5426%;&quot;&gt;표본 크기와 무관한 표준화 지표&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 13.7984%;&quot;&gt;&lt;b&gt;장단점&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;차이의 크기를 알 수 ❌&lt;/td&gt;
&lt;td style=&quot;width: 45.5426%;&quot;&gt;실제 영향력의 크기를 보여줌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  비모수 검정 대안&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 80px;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot; data-ke-style=&quot;style13&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 20.0775%; height: 20px;&quot;&gt;&lt;b&gt;종류&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 38.1007%; height: 20px;&quot;&gt;&lt;b&gt;사용처&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 41.8217%; height: 20px;&quot;&gt;&lt;b&gt;함수&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 20.0775%; height: 20px;&quot;&gt;&lt;b&gt;독립표본 t-test&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 38.1007%; height: 20px;&quot;&gt;Mann-Whitney U test&lt;/td&gt;
&lt;td style=&quot;width: 41.8217%; height: 20px;&quot;&gt;&lt;b&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt; mannwhitneyu()&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 20.0775%; height: 20px;&quot;&gt;&lt;b&gt;대응표본 t-test&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 38.1007%; height: 20px;&quot;&gt;Wilcoxon signed-rank test&lt;/td&gt;
&lt;td style=&quot;width: 41.8217%; height: 20px;&quot;&gt;&lt;b&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt; wilcoxon()&lt;/span&gt; &lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 20px;&quot;&gt;
&lt;td style=&quot;width: 20.0775%; height: 20px;&quot;&gt;&lt;b&gt;단일표본 t-test&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;width: 38.1007%; height: 20px;&quot;&gt;One-sample Wilcoxon test&lt;/td&gt;
&lt;td style=&quot;width: 41.8217%; height: 20px;&quot;&gt;
&lt;div&gt;&lt;b&gt;&lt;span style=&quot;background-color: #dddddd;&quot; data-token-index=&quot;0&quot;&gt;wilcoxon(data-기준값)&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  실무 팁&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1️⃣ &lt;b&gt;시각화 우선&lt;/b&gt; : 데이터 분포를 먼저 확인 (박스플롯, 히스토그램, Q-Q Plot)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ Q-Q Plot + Shapiro-Wilk 조합&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 정량적 검정과 시각적 확인을 함께함. &lt;b&gt;stats.shapiro()&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;불확실하면 Welch&amp;rsquo;s t-test : &lt;b&gt;stats.ttest_ind(equal_var=False)&lt;/b&gt; 가 더 안전&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ p-value + 효과 크기&lt;/b&gt; : 통계적 유의성과 실제적 중요성을 함께 평가&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 애매하면 비모수&lt;/b&gt; : 정규성이 의심스러우면 비모수 검정이 안전&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  코드&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# 필수 라이브러리 Import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import shapiro, levene, ttest_ind, ttest_rel, ttest_1samp
from scipy.stats import mannwhitneyu, wilcoxon
from sklearn.datasets import load_wine, load_iris, load_diabetes
import warnings
import platform

warnings.filterwarnings('ignore')

# 운영체제별 한글 폰트 설정
if platform.system() == 'Windows':
    plt.rcParams['font.family'] = 'Malgun Gothic'
elif platform.system() == 'Darwin':  # macOS
    plt.rcParams['font.family'] = 'AppleGothic'
else:  # Linux
    plt.rcParams['font.family'] = 'NanumGothic'

# 마이너스 기호 깨짐 방지
plt.rcParams['axes.unicode_minus'] = False

# 시각화 기본 설정
plt.rcParams['figure.figsize'] = (12, 4)

# 전역 시드 설정 (재현성을 위해)
np.random.seed(42)

print(&quot;=&quot;*50)
print(&quot;라이브러리 로드 완료!&quot;)
print(&quot;한글 폰트 설정 완료!&quot;)
print(&quot;=&quot;*50)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  정규성 판단 도우미 함수&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# 정규성 판단 도우미 함수
def check_normality_simple(data, name=&quot;데이터&quot;):
    # NaN 체크
    if pd.isna(data).any():
        print(f&quot;⚠️ 경고: {name}에 NaN 값이 {pd.isna(data).sum()}개 포함됨&quot;)
        data = data.dropna()
        print(f&quot;   &amp;rarr; NaN 제거 후 n={len(data)}&quot;)
    
    n = len(data)
    
    print(f&quot;\\n[{name} 정규성 검정] n={n}&quot;)
    print(&quot;-&quot;*40)
    
    # 왜도와 첨도
    skew = stats.skew(data)
    kurt = stats.kurtosis(data, fisher=True)
    print(f&quot;왜도(Skewness): {skew:.3f}&quot;)
    print(f&quot;첨도(Kurtosis): {kurt:.3f}&quot;)
    
    # 표본 크기에 따른 판단
    if n &amp;lt; 30:
        stat, p = shapiro(data)
        print(f&quot;Shapiro-Wilk p-value: {p:.4f}&quot;)
        is_normal = p &amp;gt; 0.05
        reason = f&quot;Shapiro p={'&amp;gt;' if is_normal else '&amp;le;'}0.05&quot;
    elif n &amp;lt; 100:
        if abs(skew) &amp;lt; 1 and abs(kurt) &amp;lt; 2:
            is_normal = True
            reason = &quot;|왜도|&amp;lt;1, |첨도|&amp;lt;2&quot;
        else:
            stat, p = shapiro(data)
            print(f&quot;추가 Shapiro-Wilk p-value: {p:.4f}&quot;)
            is_normal = p &amp;gt; 0.05
            reason = f&quot;Shapiro p={'&amp;gt;' if is_normal else '&amp;le;'}0.05&quot;
    else:
        is_normal = abs(skew) &amp;lt; 2
        reason = f&quot;|왜도|{'&amp;lt;' if is_normal else '&amp;ge;'}2 (중심극한정리)&quot;
    
    print(f&quot;결과: {'✅ 정규분포 가정 충족' if is_normal else '❌ 정규분포 가정 위반'} ({reason})&quot;)
    return is_normal
&lt;/code&gt;&lt;/pre&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Parameters
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;data : 정규성을 검정할 데이터 (Null 자동 제거)&lt;/li&gt;
&lt;li&gt;name : 출력 시 표시될 데이터 이름. str, default=&amp;rdquo;데이터&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;bool
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;True : 정규분포 가정 가능 (모수 검정)&lt;/li&gt;
&lt;li&gt;False : 정규분포 가정 위반 (비모수 검정)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;검정 기준
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;n &amp;lt; 30 : Shapiro-Wilk 검정 (p &amp;gt; 0.05)&lt;/li&gt;
&lt;li&gt;30 &amp;le; n &amp;lt; 100 : 왜도/첨도 우선, 필요시 Shapiro-Wilk&lt;/li&gt;
&lt;li&gt;n &amp;ge; 100 : 왜도 기준 ( ㅣ왜도ㅣ&amp;lt; 2, 중심극한정리)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  실습 01 : Wine 데이터로 독립표본 t-test&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❓ 클래스 0과 클래스1 와인의 알코올 도수에 차이 있는가 ❓&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*60)
print(&quot;실습 1: Wine 데이터 - 독립표본 t-test&quot;)
print(&quot;=&quot;*60)

# Wine 데이터 로드
wine = load_wine()
wine_df = pd.DataFrame(wine.data, columns=wine.feature_names)
wine_df['class'] = wine.target

print(f&quot;\\n데이터 크기: {wine_df.shape}&quot;)
print(f&quot;클래스: {wine.target_names.tolist()}&quot;)
print(&quot;\\n특징 변수 (처음 5개):&quot;)
for i, feature in enumerate(wine.feature_names[:5]):
    print(f&quot;  {i+1}. {feature}&quot;)

# 클래스 0과 1의 알코올 도수 비교
class0_alcohol = wine_df[wine_df['class'] == 0]['alcohol']
class1_alcohol = wine_df[wine_df['class'] == 1]['alcohol']

# 기초 통계량 테이블
stats_table = pd.DataFrame({
    '구분': ['Class 0', 'Class 1'],
    '샘플수': [len(class0_alcohol), len(class1_alcohol)],
    '평균': [class0_alcohol.mean(), class1_alcohol.mean()],
    '표준편차': [class0_alcohol.std(), class1_alcohol.std()],
    '최소값': [class0_alcohol.min(), class1_alcohol.min()],
    '최대값': [class0_alcohol.max(), class1_alcohol.max()]
})

print(&quot;\\n[알코올 도수 기초 통계량]&quot;)
display(stats_table.round(2))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;780&quot; data-origin-height=&quot;499&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cHueZa/btsQ0up3RJ8/ZvPd1m6ykehrXLzI7b5hd1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cHueZa/btsQ0up3RJ8/ZvPd1m6ykehrXLzI7b5hd1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cHueZa/btsQ0up3RJ8/ZvPd1m6ykehrXLzI7b5hd1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcHueZa%2FbtsQ0up3RJ8%2FZvPd1m6ykehrXLzI7b5hd1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;780&quot; height=&quot;499&quot; data-origin-width=&quot;780&quot; data-origin-height=&quot;499&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 시각화
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# 박스플롯
bp = axes[0].boxplot([class0_alcohol, class1_alcohol], 
                      labels=['Class 0', 'Class 1'],
                      patch_artist=True)
bp['boxes'][0].set_facecolor(&quot;#aed6df&quot;)
bp['boxes'][1].set_facecolor(&quot;#fea188&quot;)
axes[0].set_ylabel('알코올 도수')
axes[0].set_title('알코올 도수 분포')
axes[0].grid(True, alpha=0.3)

# 히스토그램
axes[1].hist(class0_alcohol, bins=10, alpha=0.6, label='Class 0', 
             color=&quot;#0063b2&quot;, density=True, edgecolor='black')
axes[1].hist(class1_alcohol, bins=10, alpha=0.6, label='Class 1', 
             color=&quot;#e94b3c&quot;, density=True, edgecolor='black')
axes[1].set_xlabel('알코올 도수')
axes[1].set_ylabel('밀도')
axes[1].set_title('알코올 도수 분포 비교')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Q-Q plot (Class 0)
stats.probplot(class0_alcohol, dist=&quot;norm&quot;, plot=axes[2])
axes[2].set_title('Q-Q Plot (Class 0)')
axes[2].grid(True, alpha=0.3)

# stats.probplot() 함수는 직접적으로 색상 변경 옵션 제공X. SO, 반환된 결과를 사용해 수정해야함
(osm, osr), (slope, intercept, r) = stats.probplot(class0_alcohol, dist=&quot;norm&quot;, fit=True)
line = axes[2].get_lines()[0]  # 기존 라인 가져오기
axes[2].clear()  # 기존 플롯 지우기

# 새로운 색상으로 마커와 라인 그리기
axes[2].scatter(osm, osr, color='#0063b2', marker='o', alpha=0.7)  # 마커 색상 변경
axes[2].plot(osm, slope * osm + intercept, color='#e94b3c', linewidth=2)  # 선 색상 변경

axes[2].set_title('Q-Q Plot (Class 0)')
axes[2].grid(True, alpha=0.3)
plt.show()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1159&quot; data-origin-height=&quot;467&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bCfyfv/btsQ3e0tbvt/I4fG0D7EYa9XC5nHgvWB9K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bCfyfv/btsQ3e0tbvt/I4fG0D7EYa9XC5nHgvWB9K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bCfyfv/btsQ3e0tbvt/I4fG0D7EYa9XC5nHgvWB9K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbCfyfv%2FbtsQ3e0tbvt%2FI4fG0D7EYa9XC5nHgvWB9K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1159&quot; height=&quot;467&quot; data-origin-width=&quot;1159&quot; data-origin-height=&quot;467&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 표본 크기에 따른 정규성 확인&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 정규성 검정&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*50)
print(&quot;가설검정 프로세스&quot;)
print(&quot;=&quot;*50)

# Step 1: 정규성 검정
is_normal_0 = check_normality_simple(class0_alcohol, &quot;Class 0 알코올&quot;)
is_normal_1 = check_normality_simple(class1_alcohol, &quot;Class 1 알코올&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;669&quot; data-origin-height=&quot;387&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vTqFo/btsQ3zXyhAF/IMArh6mZZxXynMTbcbkop0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vTqFo/btsQ3zXyhAF/IMArh6mZZxXynMTbcbkop0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vTqFo/btsQ3zXyhAF/IMArh6mZZxXynMTbcbkop0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FvTqFo%2FbtsQ3zXyhAF%2FIMArh6mZZxXynMTbcbkop0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;669&quot; height=&quot;387&quot; data-origin-width=&quot;669&quot; data-origin-height=&quot;387&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 등분산성 검정 (독립표본만) stats.levene&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# Step 2: 등분산성 검정
print(&quot;\\n[등분산성 검정]&quot;)
print(&quot;-&quot;*40)
stat, p_levene = levene(class0_alcohol, class1_alcohol)
print(f&quot;Levene's test p-value: {p_levene:.4f}&quot;)
equal_var = p_levene &amp;gt; 0.05
print(f&quot;결과: {'✅ 등분산 가정 충족' if equal_var else '❌ 이분산 &amp;rarr; Welch t-test 사용'}&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;333&quot; data-origin-height=&quot;138&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/yr2NY/btsQ3CNneTj/7rKkgI3UYbhk7DExrMPoHK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/yr2NY/btsQ3CNneTj/7rKkgI3UYbhk7DExrMPoHK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/yr2NY/btsQ3CNneTj/7rKkgI3UYbhk7DExrMPoHK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fyr2NY%2FbtsQ3CNneTj%2F7rKkgI3UYbhk7DExrMPoHK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;333&quot; height=&quot;138&quot; data-origin-width=&quot;333&quot; data-origin-height=&quot;138&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 가설검정 ttest_ind(equal_var=equal_var)&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;5️⃣ 결론 도출&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# =============================================================================
# Step 3: 가설검정
# =============================================================================
print(&quot;\\n[가설검정]&quot;)
print(&quot;-&quot;*40)

# 가설 설정
print(&quot;H₀: &amp;mu;₀ = &amp;mu;₁ (두 클래스의 알코올 도수가 같다)&quot;)
print(&quot;H₁: &amp;mu;₀ &amp;ne; &amp;mu;₁ (두 클래스의 알코올 도수가 다르다)&quot;)
print(&quot;유의수준: &amp;alpha; = 0.05&quot;)

# -----------------------------------------------------------------------------
# 3-1. 검정 방법 선택 및 실행
# -----------------------------------------------------------------------------
# 정규성 검정 결과에 따라 모수/비모수 검정 선택
if is_normal_0 and is_normal_1:
    # 모수 검정: 독립표본 t-검정 (두 그룹 모두 정규분포)
    t_stat, p_value = ttest_ind(class0_alcohol, class1_alcohol, equal_var=equal_var)
    test_name = &quot;Student's t-test&quot; if equal_var else &quot;Welch's t-test&quot;
    print(f&quot;\\n{test_name} 결과:&quot;)
    print(f&quot;t = {t_stat:.4f}, p = {p_value:.4f}&quot;)
    
    # Cohen's d 효과 크기 계산 (표준화된 평균 차이)
    # d = (평균1 - 평균2) / 합동표준편차
    pooled_std = np.sqrt((class0_alcohol.var() + class1_alcohol.var()) / 2)
    cohens_d = (class0_alcohol.mean() - class1_alcohol.mean()) / pooled_std
    abs_d = abs(cohens_d)
    
    # Cohen's d 해석 기준
    if abs_d &amp;lt; 0.2:
        effect = &quot;매우 작은 효과&quot;
    elif abs_d &amp;lt; 0.5:
        effect = &quot;작은 효과&quot;
    elif abs_d &amp;lt; 0.8:
        effect = &quot;중간 효과&quot;
    else:
        effect = &quot;큰 효과&quot;
    
    print(f&quot;Cohen's d = {cohens_d:.3f} ({effect})&quot;)

else:
    # 비모수 검정: Mann-Whitney U 검정 (정규성 가정 위반)
    # 중앙값 차이를 검정 (순위 기반)
    u_stat, p_value = mannwhitneyu(class0_alcohol, class1_alcohol, alternative='two-sided')
    print(f&quot;\\nMann-Whitney U test 결과:&quot;)
    print(f&quot;U = {u_stat:.4f}, p = {p_value:.4f}&quot;)

# -----------------------------------------------------------------------------
# 3-2. 통계적 결론 도출
# -----------------------------------------------------------------------------
print(f&quot;\\n[결론]&quot;)

# p-value를 유의수준(&amp;alpha;=0.05)과 비교하여 가설 채택/기각 결정
if p_value &amp;lt; 0.05:
    print(f&quot;✅ p-value({p_value:.4f}) &amp;lt; 0.05 &amp;rarr; 귀무가설 기각&quot;)
    print(f&quot;   두 클래스의 알코올 도수에 유의한 차이가 있음&quot;)
    print(f&quot;   (통계적으로 의미있는 차이 존재)&quot;)
else:
    print(f&quot;❌ p-value({p_value:.4f}) &amp;ge; 0.05 &amp;rarr; 귀무가설 채택&quot;)
    print(f&quot;   두 클래스의 알코올 도수에 유의한 차이가 없음&quot;)
    print(f&quot;   (관측된 차이는 우연에 의한 것일 수 있음)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;429&quot; data-origin-height=&quot;363&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bkcbSK/btsQ2xeTShY/tkMCi0K3KSBdMlE8yizMHK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bkcbSK/btsQ2xeTShY/tkMCi0K3KSBdMlE8yizMHK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bkcbSK/btsQ2xeTShY/tkMCi0K3KSBdMlE8yizMHK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbkcbSK%2FbtsQ2xeTShY%2FtkMCi0K3KSBdMlE8yizMHK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;429&quot; height=&quot;363&quot; data-origin-width=&quot;429&quot; data-origin-height=&quot;363&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  실습 02 : Diabetes 데이터로 대응표본 t-test&lt;/b&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❓ 치료 전후 혈당 수치 변화 있는가 ❓&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*60)
print(&quot;실습 2: Diabetes 데이터 - 대응표본 t-test&quot;)
print(&quot;=&quot;*60)

# Diabetes 데이터 로드 및 가상의 전후 데이터 생성
diabetes = load_diabetes()

# 30명 환자의 치료 전 혈당 (표준화된 값)
n_patients = 30
before_glucose = diabetes.target[:n_patients]

# 치료 후 혈당 (평균적으로 감소하는 가상의 데이터 생성)
treatment_effect = np.random.normal(-15, 5, n_patients)  # 평균 15 감소
after_glucose = before_glucose + treatment_effect

# DataFrame 생성
treatment_df = pd.DataFrame({
    '환자ID': [f'P{i:03d}' for i in range(1, n_patients+1)],
    '치료전': before_glucose,
    '치료후': after_glucose,
    '변화량': after_glucose - before_glucose
})

print(&quot;\\n[데이터 샘플 (처음 5명)]&quot;)
display(treatment_df.head())

print(&quot;\\n[기초 통계량]&quot;)
stats_summary = pd.DataFrame({
    '구분': ['치료 전', '치료 후', '변화량'],
    '평균': [treatment_df['치료전'].mean(), 
            treatment_df['치료후'].mean(),
            treatment_df['변화량'].mean()],
    '표준편차': [treatment_df['치료전'].std(),
                treatment_df['치료후'].std(),
                treatment_df['변화량'].std()]
})
display(stats_summary.round(2))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;791&quot; data-origin-height=&quot;587&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/DR1w2/btsQ3BVjowM/w1Ko0lVHOKaXTINqvSQ831/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/DR1w2/btsQ3BVjowM/w1Ko0lVHOKaXTINqvSQ831/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/DR1w2/btsQ3BVjowM/w1Ko0lVHOKaXTINqvSQ831/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FDR1w2%2FbtsQ3BVjowM%2Fw1Ko0lVHOKaXTINqvSQ831%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;791&quot; height=&quot;587&quot; data-origin-width=&quot;791&quot; data-origin-height=&quot;587&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 시각화
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# Before-After 연결선 그래프
for i in range(len(treatment_df)):
    axes[0].plot([0, 1], [treatment_df.iloc[i]['치료전'], treatment_df.iloc[i]['치료후']], 
                'gray', alpha=0.4, linewidth=0.8)
axes[0].plot([0, 1], [treatment_df['치료전'].mean(), treatment_df['치료후'].mean()], 
            'red', linewidth=3, marker='o', markersize=8, label='평균')
axes[0].set_xticks([0, 1])
axes[0].set_xticklabels(['치료 전', '치료 후'])
axes[0].set_ylabel('혈당 수치')
axes[0].set_title('개인별 변화')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# 박스플롯
bp = axes[1].boxplot([treatment_df['치료전'], treatment_df['치료후']], 
                     labels=['치료 전', '치료 후'],
                     patch_artist=True)
bp['boxes'][0].set_facecolor(&quot;#ead98b&quot;)
bp['boxes'][1].set_facecolor(&quot;#7dd0b6&quot;)
axes[1].set_ylabel('혈당 수치')
axes[1].set_title('혈당 분포')
axes[1].grid(True, alpha=0.3)

# 변화량 히스토그램
axes[2].hist(treatment_df['변화량'], bins=10, edgecolor='black', alpha=0.7, color=&quot;#93c763&quot;)
axes[2].axvline(0, color='red', linestyle='--', linewidth=2, label='변화 없음')
axes[2].axvline(treatment_df['변화량'].mean(), color='blue', linestyle='--', 
               linewidth=2, label=f'평균: {treatment_df[&quot;변화량&quot;].mean():.1f}')
axes[2].set_xlabel('혈당 변화량')
axes[2].set_ylabel('빈도')
axes[2].set_title('변화량 분포')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1390&quot; data-origin-height=&quot;490&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b4Enzp/btsQ1nYuO0V/SK32bmgNgofy097a4rvtCk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b4Enzp/btsQ1nYuO0V/SK32bmgNgofy097a4rvtCk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b4Enzp/btsQ1nYuO0V/SK32bmgNgofy097a4rvtCk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb4Enzp%2FbtsQ1nYuO0V%2FSK32bmgNgofy097a4rvtCk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1390&quot; height=&quot;490&quot; data-origin-width=&quot;1390&quot; data-origin-height=&quot;490&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 표본 크기에 따른 정규성 확인&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 정규성 검정&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*50)
print(&quot;가설검정 프로세스&quot;)
print(&quot;=&quot;*50)

# Step 1: 변화량의 정규성 검정
is_normal_diff = check_normality_simple(treatment_df['변화량'], &quot;변화량&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;665&quot; data-origin-height=&quot;245&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/L8ClH/btsQ1wnLwtz/DMDfnOoQKK84phQIZm0781/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/L8ClH/btsQ1wnLwtz/DMDfnOoQKK84phQIZm0781/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/L8ClH/btsQ1wnLwtz/DMDfnOoQKK84phQIZm0781/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FL8ClH%2FbtsQ1wnLwtz%2FDMDfnOoQKK84phQIZm0781%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;665&quot; height=&quot;245&quot; data-origin-width=&quot;665&quot; data-origin-height=&quot;245&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 가설검정 ttest_rel()&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 결론 도출&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# =============================================================================
# Step 2: 가설검정
# =============================================================================
print(&quot;\\n[가설검정]&quot;)
print(&quot;-&quot;*40)

# 가설 설정 (대응표본 검정)
print(&quot;H₀: &amp;mu;_before = &amp;mu;_after (치료 효과 없음)&quot;)
print(&quot;H₁: &amp;mu;_before &amp;ne; &amp;mu;_after (치료 효과 있음)&quot;)
print(&quot;유의수준: &amp;alpha; = 0.05&quot;)

# -----------------------------------------------------------------------------
# 2-1. 검정 방법 선택 및 실행
# -----------------------------------------------------------------------------
# 차이값의 정규성에 따라 모수/비모수 검정 선택
if is_normal_diff:
    # 모수 검정: 대응표본 t-검정 (차이값이 정규분포)
    # 동일 대상의 전후 비교이므로 paired t-test 사용
    t_stat, p_value = ttest_rel(treatment_df['치료전'], treatment_df['치료후'])
    print(f&quot;\\nPaired t-test 결과:&quot;)
    print(f&quot;t = {t_stat:.4f}, p = {p_value:.4f}&quot;)
    
    # -------------------------------------------------------------------------
    # 2-2. 효과 크기 계산 (Cohen's d for paired samples)
    # -------------------------------------------------------------------------
    # 대응표본의 Cohen's d = 평균 변화량 / 변화량의 표준편차
    cohens_d = treatment_df['변화량'].mean() / treatment_df['변화량'].std()
    abs_d = abs(cohens_d)
    
    # Cohen's d 해석 기준 (대응표본)
    if abs_d &amp;lt; 0.2:
        effect = &quot;매우 작은 효과&quot;
    elif abs_d &amp;lt; 0.5:
        effect = &quot;작은 효과&quot;  
    elif abs_d &amp;lt; 0.8:
        effect = &quot;중간 효과&quot;
    else:
        effect = &quot;큰 효과&quot;
    
    print(f&quot;Cohen's d = {cohens_d:.3f} ({effect})&quot;)
    
    # -------------------------------------------------------------------------
    # 2-3. 신뢰구간 계산
    # -------------------------------------------------------------------------
    # 평균 변화량의 95% 신뢰구간 추정
    # CI = 평균 &amp;plusmn; t(&amp;alpha;/2, df) &amp;times; SE
    confidence = 0.95  # 신뢰수준
    n = len(treatment_df)  # 표본 크기
    mean_diff = treatment_df['변화량'].mean()  # 평균 변화량
    se_diff = stats.sem(treatment_df['변화량'])  # 표준오차
    
    # t-분포 기반 신뢰구간 (자유도 = n-1)
    ci = stats.t.interval(confidence, n-1, loc=mean_diff, scale=se_diff)
    print(f&quot;평균 변화의 95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]&quot;)
    
else:
    # 비모수 검정: Wilcoxon 부호순위 검정 (정규성 가정 위반)
    # 중앙값 차이를 검정 (순위와 부호 기반)
    w_stat, p_value = wilcoxon(treatment_df['치료전'], treatment_df['치료후'])
    print(f&quot;\\nWilcoxon signed-rank test 결과:&quot;)
    print(f&quot;W = {w_stat:.4f}, p = {p_value:.4f}&quot;)

# -----------------------------------------------------------------------------
# 2-4. 통계적 결론 도출
# -----------------------------------------------------------------------------
print(f&quot;\\n[결론]&quot;)

# p-value를 유의수준(&amp;alpha;=0.05)과 비교하여 가설 채택/기각 결정
if p_value &amp;lt; 0.05:
    print(f&quot;✅ p-value({p_value:.4f}) &amp;lt; 0.05 &amp;rarr; 귀무가설 기각&quot;)
    print(f&quot;   치료가 효과가 있음 (평균 {abs(treatment_df['변화량'].mean()):.1f} 감소)&quot;)
    print(f&quot;   (통계적으로 유의한 개선 효과)&quot;)
else:
    print(f&quot;❌ p-value({p_value:.4f}) &amp;ge; 0.05 &amp;rarr; 귀무가설 채택&quot;)
    print(f&quot;   치료 효과가 유의하지 않음&quot;)
    print(f&quot;   (관측된 변화는 우연에 의한 것일 수 있음)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;400&quot; data-origin-height=&quot;393&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cjdYVn/btsQ3XKGfYQ/uYXYQzl5NkrY89X9qpQolK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cjdYVn/btsQ3XKGfYQ/uYXYQzl5NkrY89X9qpQolK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cjdYVn/btsQ3XKGfYQ/uYXYQzl5NkrY89X9qpQolK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcjdYVn%2FbtsQ3XKGfYQ%2FuYXYQzl5NkrY89X9qpQolK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;393&quot; data-origin-width=&quot;400&quot; data-origin-height=&quot;393&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  실습 03 : Iris 데이터로 단일표본 t-test&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❓ Setosa 종의 평균 꽃받침 길이가 5.0cm인가 검정 ❓&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*60)
print(&quot;실습 3: Iris 데이터 - 단일표본 t-test&quot;)
print(&quot;=&quot;*60)

# Iris 데이터 로드
iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = iris.target

# Setosa 종의 꽃받침 길이
setosa_sepal = iris_df[iris_df['species'] == 0]['sepal length (cm)']
target_value = 5.0  # 검정할 기준값

print(f&quot;[Setosa 꽃받침 길이 정보]&quot;)
print(f&quot;샘플 수: {len(setosa_sepal)}&quot;)
print(f&quot;평균: {setosa_sepal.mean():.3f}cm&quot;)
print(f&quot;표준편차: {setosa_sepal.std():.3f}cm&quot;)
print(f&quot;중앙값: {setosa_sepal.median():.3f}cm&quot;)
print(f&quot;검정 기준값: {target_value}cm&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;814&quot; data-origin-height=&quot;243&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/lggie/btsQ0456QWq/cjCpQgW1Z9clfFKzo8pKX1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/lggie/btsQ0456QWq/cjCpQgW1Z9clfFKzo8pKX1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/lggie/btsQ0456QWq/cjCpQgW1Z9clfFKzo8pKX1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Flggie%2FbtsQ0456QWq%2FcjCpQgW1Z9clfFKzo8pKX1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;814&quot; height=&quot;243&quot; data-origin-width=&quot;814&quot; data-origin-height=&quot;243&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;jboss-cli&quot;&gt;&lt;code&gt;# 시각화
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# 히스토그램
axes[0].hist(setosa_sepal, bins=12, edgecolor='black', alpha=0.7, color=&quot;#aed6df&quot;)
axes[0].axvline(target_value, color='red', linestyle='--', linewidth=2, label=f'기준값: {target_value}cm')
axes[0].axvline(setosa_sepal.mean(), color='green', linestyle='--', linewidth=2, 
                label=f'평균: {setosa_sepal.mean():.2f}cm')
axes[0].set_xlabel('꽃받침 길이 (cm)')
axes[0].set_ylabel('빈도')
axes[0].set_title('Setosa 꽃받침 길이 분포')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# 박스플롯
bp = axes[1].boxplot(setosa_sepal, patch_artist=True)
bp['boxes'][0].set_facecolor(&quot;#aed6df&quot;)
axes[1].axhline(target_value, color='red', linestyle='--', linewidth=2)
axes[1].set_ylabel('꽃받침 길이 (cm)')
axes[1].set_title('박스플롯')
axes[1].text(1.1, target_value+0.05, f'기준값: {target_value}', color='red')
axes[1].grid(True, alpha=0.3)

# Q-Q plot
stats.probplot(setosa_sepal, dist=&quot;norm&quot;, plot=axes[2])
axes[2].set_title('Q-Q Plot')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1390&quot; data-origin-height=&quot;490&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/kGU7m/btsQ2ovJcdE/AVQKkU3GB9BHcz21KqgKrK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/kGU7m/btsQ2ovJcdE/AVQKkU3GB9BHcz21KqgKrK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/kGU7m/btsQ2ovJcdE/AVQKkU3GB9BHcz21KqgKrK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FkGU7m%2FbtsQ2ovJcdE%2FAVQKkU3GB9BHcz21KqgKrK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1390&quot; height=&quot;490&quot; data-origin-width=&quot;1390&quot; data-origin-height=&quot;490&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 표본 크기에 따른 정규성 확인&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 정규성 검정&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*50)
print(&quot;가설검정 프로세스&quot;)
print(&quot;=&quot;*50)

# Step 1: 정규성 검정
is_normal = check_normality_simple(setosa_sepal, &quot;Setosa 꽃받침 길이&quot;)

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;661&quot; data-origin-height=&quot;244&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bhvp3W/btsQ1vPXU7H/YYgtZ5knJ0pzErrGHa2Y5k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bhvp3W/btsQ1vPXU7H/YYgtZ5knJ0pzErrGHa2Y5k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bhvp3W/btsQ1vPXU7H/YYgtZ5knJ0pzErrGHa2Y5k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbhvp3W%2FbtsQ1vPXU7H%2FYYgtZ5knJ0pzErrGHa2Y5k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;661&quot; height=&quot;244&quot; data-origin-width=&quot;661&quot; data-origin-height=&quot;244&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 가설검정 ttest_1smap&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 결론 도출&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# =============================================================================
# Step 2: 가설검정
# =============================================================================
print(&quot;\\n[가설검정]&quot;)
print(&quot;-&quot;*40)

# 가설 설정 (단일표본 검정)
# 표본 평균이 특정 값(target_value)과 같은지 검정
print(f&quot;H₀: &amp;mu; = {target_value}cm (평균이 {target_value}cm)&quot;)
print(f&quot;H₁: &amp;mu; &amp;ne; {target_value}cm (평균이 {target_value}cm가 아님)&quot;)
print(&quot;유의수준: &amp;alpha; = 0.05&quot;)

# -----------------------------------------------------------------------------
# 2-1. 검정 방법 선택 및 실행
# -----------------------------------------------------------------------------
# 정규성 검정 결과에 따라 모수/비모수 검정 선택
if is_normal:
    # 모수 검정: 단일표본 t-검정 (데이터가 정규분포)
    # 표본 평균과 모집단 평균(target_value)을 비교
    t_stat, p_value = ttest_1samp(setosa_sepal, target_value)
    print(f&quot;\\nOne-sample t-test 결과:&quot;)
    print(f&quot;t = {t_stat:.4f}, p = {p_value:.4f}&quot;)
    
    # -------------------------------------------------------------------------
    # 2-2. 신뢰구간 계산 및 해석
    # -------------------------------------------------------------------------
    # 모평균의 95% 신뢰구간 추정
    # CI = 표본평균 &amp;plusmn; t(&amp;alpha;/2, df) &amp;times; SE
    confidence = 0.95  # 신뢰수준
    n = len(setosa_sepal)  # 표본 크기
    mean = setosa_sepal.mean()  # 표본 평균
    se = stats.sem(setosa_sepal)  # 표준오차 (SE = s/&amp;radic;n)
    
    # t-분포 기반 신뢰구간 (자유도 = n-1)
    ci = stats.t.interval(confidence, n-1, loc=mean, scale=se)
    print(f&quot;평균의 95% CI: [{ci[0]:.3f}, {ci[1]:.3f}]cm&quot;)
    
    # 신뢰구간과 목표값 비교
    # 목표값이 신뢰구간 내에 있으면 H₀를 기각할 수 없음
    if ci[0] &amp;lt;= target_value &amp;lt;= ci[1]:
        print(f&quot;&amp;rarr; {target_value}cm가 신뢰구간 내에 있음&quot;)
    else:
        print(f&quot;&amp;rarr; {target_value}cm가 신뢰구간 밖에 있음&quot;)
        
else:
    # 비모수 검정: Wilcoxon 부호순위 검정 (정규성 가정 위반)
    # 중앙값이 목표값과 같은지 검정 (순위와 부호 기반)
    
    # 각 관측값과 목표값의 차이 계산
    differences = setosa_sepal - target_value
    
    # Wilcoxon 부호순위 검정 실행
    # 차이의 절댓값에 순위를 매기고, 부호를 고려하여 검정
    w_stat, p_value = wilcoxon(differences)
    print(f&quot;\\nWilcoxon signed-rank test 결과:&quot;)
    print(f&quot;W = {w_stat:.4f}, p = {p_value:.4f}&quot;)

# -----------------------------------------------------------------------------
# 2-3. 통계적 결론 도출
# -----------------------------------------------------------------------------
print(f&quot;\\n[결론]&quot;)

# p-value를 유의수준(&amp;alpha;=0.05)과 비교하여 가설 채택/기각 결정
if p_value &amp;lt; 0.05:
    print(f&quot;✅ p-value({p_value:.4f}) &amp;lt; 0.05 &amp;rarr; 귀무가설 기각&quot;)
    
    # 차이의 방향 확인 (평균이 목표값보다 높은지 낮은지)
    diff = setosa_sepal.mean() - target_value
    if diff &amp;gt; 0:
        print(f&quot;   평균({setosa_sepal.mean():.3f}cm)이 {target_value}cm보다 유의하게 높음&quot;)
    else:
        print(f&quot;   평균({setosa_sepal.mean():.3f}cm)이 {target_value}cm보다 유의하게 낮음&quot;)
    print(f&quot;   (통계적으로 의미있는 차이 존재)&quot;)
    
else:
    print(f&quot;❌ p-value({p_value:.4f}) &amp;ge; 0.05 &amp;rarr; 귀무가설 채택&quot;)
    print(f&quot;   평균이 {target_value}cm와 유의한 차이가 없음&quot;)
    print(f&quot;   (관측된 차이는 우연에 의한 것일 수 있음)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;406&quot; data-origin-height=&quot;405&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/uDIxW/btsQ14YRvxH/1LUw47lAjv7AVokPciwYhk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/uDIxW/btsQ14YRvxH/1LUw47lAjv7AVokPciwYhk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/uDIxW/btsQ14YRvxH/1LUw47lAjv7AVokPciwYhk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FuDIxW%2FbtsQ14YRvxH%2F1LUw47lAjv7AVokPciwYhk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;406&quot; height=&quot;405&quot; data-origin-width=&quot;406&quot; data-origin-height=&quot;405&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  실습 04 : Wine 데이터 다중 비교&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;python&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;# =============================================================================
# 추가 실습: Wine 데이터에서 여러 특징 비교
# =============================================================================
print(&quot;\\n&quot; + &quot;=&quot;*60)
print(&quot;추가 실습: Wine 데이터에서 여러 특징 비교&quot;)
print(&quot;=&quot;*60)

# -----------------------------------------------------------------------------
# 1. 비교할 특징 설정 및 반복 검정
# -----------------------------------------------------------------------------
# Class 0과 Class 1 간 여러 특징을 한 번에 비교
features_to_compare = ['malic_acid', 'ash', 'total_phenols', 'flavanoids']

# 결과를 저장할 리스트
results = []

# 각 특징에 대해 독립표본 t-검정 수행
for feature in features_to_compare:
    # -------------------------------------------------------------------------
    # 1-1. 데이터 추출
    # -------------------------------------------------------------------------
    class0_data = wine_df[wine_df['class'] == 0][feature]
    class1_data = wine_df[wine_df['class'] == 1][feature]
    
    # -------------------------------------------------------------------------
    # 1-2. 등분산 검정 (Levene's test)
    # -------------------------------------------------------------------------
    # 두 그룹의 분산이 같은지 검정하여 적절한 t-test 선택
    _, p_levene = levene(class0_data, class1_data)
    equal_var = p_levene &amp;gt; 0.05  # p &amp;gt; 0.05면 등분산 가정
    
    # -------------------------------------------------------------------------
    # 1-3. 독립표본 t-검정 수행
    # -------------------------------------------------------------------------
    # equal_var에 따라 Student's t-test 또는 Welch's t-test 실행
    t_stat, p_value = ttest_ind(class0_data, class1_data, equal_var=equal_var)
    
    # -------------------------------------------------------------------------
    # 1-4. 효과 크기 계산 (Cohen's d)
    # -------------------------------------------------------------------------
    # 표준화된 평균 차이 = (평균1 - 평균2) / 합동표준편차
    pooled_std = np.sqrt((class0_data.var() + class1_data.var()) / 2)
    cohens_d = (class0_data.mean() - class1_data.mean()) / pooled_std
    
    # Cohen's d 해석 기준
    abs_d = abs(cohens_d)
    if abs_d &amp;lt; 0.2:
        effect = &quot;매우 작음&quot;
    elif abs_d &amp;lt; 0.5:
        effect = &quot;작음&quot;
    elif abs_d &amp;lt; 0.8:
        effect = &quot;중간&quot;
    else:
        effect = &quot;큼&quot;
    
    # -------------------------------------------------------------------------
    # 1-5. 결과 저장
    # -------------------------------------------------------------------------
    results.append({
        '특징': feature,
        'Class0 평균': class0_data.mean(),
        'Class1 평균': class1_data.mean(),
        '차이': class0_data.mean() - class1_data.mean(),
        't값': t_stat,
        'p-value': p_value,
        &quot;Cohen's d&quot;: cohens_d,
        '효과크기': effect,
        '유의성': '유의함' if p_value &amp;lt; 0.05 else '유의하지 않음'
    })

# -----------------------------------------------------------------------------
# 2. 결과 정리 및 출력
# -----------------------------------------------------------------------------
# 결과를 DataFrame으로 변환하여 가독성 향상
results_df = pd.DataFrame(results)
results_df = results_df.round(4)  # 소수점 4자리로 반올림

print(&quot;\\n[Class 0 vs Class 1 비교 결과]&quot;)
# 핵심 정보만 선택하여 표시
display(results_df[['특징', '차이', 'p-value', &quot;Cohen's d&quot;, '효과크기', '유의성']])

# -----------------------------------------------------------------------------
# 3. 시각화: p-value 및 효과 크기 비교
# -----------------------------------------------------------------------------
fig, ax = plt.subplots(figsize=(10, 6))

# -------------------------------------------------------------------------
# 3-1. p-value 막대 그래프
# -------------------------------------------------------------------------
# 유의한 결과(p&amp;lt;0.05)는 빨간색, 그렇지 않으면 회색으로 표시
colors = ['red' if p &amp;lt; 0.05 else 'gray' for p in results_df['p-value']]
bars = ax.bar(range(len(results_df)), results_df['p-value'], color=colors)

# 유의수준 기준선 (&amp;alpha; = 0.05)
ax.axhline(0.05, color='black', linestyle='--', label='p=0.05')

# x축 레이블 설정
ax.set_xticks(range(len(results_df)))
ax.set_xticklabels(results_df['특징'], rotation=45)
ax.set_ylabel('p-value')
ax.set_title('각 특징별 유의성 검정 결과')
ax.legend()

# -------------------------------------------------------------------------
# 3-2. 막대 위에 Cohen's d 값 표시
# -------------------------------------------------------------------------
# 각 막대 위에 효과 크기를 텍스트로 추가
for i, (bar, d) in enumerate(zip(bars, results_df[&quot;Cohen's d&quot;])):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
            f'd={d:.2f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;782&quot; data-origin-height=&quot;329&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/muU7F/btsQ1U9zmxB/myooGWTHV90jzWZanlwmkk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/muU7F/btsQ1U9zmxB/myooGWTHV90jzWZanlwmkk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/muU7F/btsQ1U9zmxB/myooGWTHV90jzWZanlwmkk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FmuU7F%2FbtsQ1U9zmxB%2FmyooGWTHV90jzWZanlwmkk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;782&quot; height=&quot;329&quot; data-origin-width=&quot;782&quot; data-origin-height=&quot;329&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;590&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/eywTr6/btsQ31TOKhk/MZJKZDVYIMcBn71Fakhkh1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eywTr6/btsQ31TOKhk/MZJKZDVYIMcBn71Fakhkh1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eywTr6/btsQ31TOKhk/MZJKZDVYIMcBn71Fakhkh1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeywTr6%2FbtsQ31TOKhk%2FMZJKZDVYIMcBn71Fakhkh1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;989&quot; height=&quot;590&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;590&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;  self 실습&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  self 실습 01 : 레스토랑 매출 비교&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❓ 주말(토,일)과 평일(월~금)의 일일 평균 매출에 차이가 있는가? ❓&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; 두 독립 그룹의 비교니까 &lt;b&gt;ttest_ind()&lt;/b&gt; 사용하기?&lt;/p&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# 가상의 레스토랑 매출 데이터 생성
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import shapiro, levene, ttest_ind

np.random.seed(42)

# 4주간의 데이터 (28일)
days = ['월', '화', '수', '목', '금', '토', '일'] * 4
dates = pd.date_range('2024-01-01', periods=28)

# 평일은 평균 100만원, 주말은 평균 130만원 매출 (단위: 만원)
daily_sales = []
for day in days:
    if day in ['토', '일']:
        # 주말: 평균 130, 표준편차 20
        sale = np.random.normal(130, 20)
    else:
        # 평일: 평균 100, 표준편차 15
        sale = np.random.normal(100, 15)
    daily_sales.append(sale)

# DataFrame 생성
sales_df = pd.DataFrame({
    '날짜': dates,
    '요일': days,
    '매출액': daily_sales,
    '주말여부': ['주말' if d in ['토', '일'] else '평일' for d in days]
})

# 주말과 평일 데이터 분리
weekend_sales = sales_df[sales_df['주말여부'] == '주말']['매출액']
weekday_sales = sales_df[sales_df['주말여부'] == '평일']['매출액']

print(&quot;=&quot;*60)
print(&quot;실습 문제 1: 레스토랑 매출 비교&quot;)
print(&quot;=&quot;*60)
print(f&quot;\\n데이터 기간: 4주 (28일)&quot;)
print(f&quot;평일(월~금) 데이터: n={len(weekday_sales)}&quot;)
print(f&quot;주말(토,일) 데이터: n={len(weekend_sales)}&quot;)
print(&quot;\\n[데이터 미리보기]&quot;)
display(sales_df.head(7))  # 첫 주 데이터
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;794&quot; data-origin-height=&quot;530&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/URCd6/btsQ2ivCw8I/dwEwFSE8ykmr7ZUW7QykEK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/URCd6/btsQ2ivCw8I/dwEwFSE8ykmr7ZUW7QykEK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/URCd6/btsQ2ivCw8I/dwEwFSE8ykmr7ZUW7QykEK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FURCd6%2FbtsQ2ivCw8I%2FdwEwFSE8ykmr7ZUW7QykEK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;794&quot; height=&quot;530&quot; data-origin-width=&quot;794&quot; data-origin-height=&quot;530&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;dd = sales_df.groupby([&quot;주말여부&quot;])[&quot;매출액&quot;].describe()
dd
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;930&quot; data-origin-height=&quot;159&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c62Y3x/btsQ3klX3cX/vIMCGbi08dPqhSBHww0KlK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c62Y3x/btsQ3klX3cX/vIMCGbi08dPqhSBHww0KlK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c62Y3x/btsQ3klX3cX/vIMCGbi08dPqhSBHww0KlK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc62Y3x%2FbtsQ3klX3cX%2FvIMCGbi08dPqhSBHww0KlK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;930&quot; height=&quot;159&quot; data-origin-width=&quot;930&quot; data-origin-height=&quot;159&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;markdown&quot;&gt;&lt;code&gt;cond0 = sales_df[sales_df[&quot;주말여부&quot;] == &quot;평일&quot;][&quot;매출액&quot;]
cond1 = sales_df[sales_df[&quot;주말여부&quot;] == &quot;주말&quot;][&quot;매출액&quot;]

cond1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;271&quot; data-origin-height=&quot;232&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/t3yPF/btsQ3XRqIo7/Nl4GRdjYQRYQuSB9hMePU0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/t3yPF/btsQ3XRqIo7/Nl4GRdjYQRYQuSB9hMePU0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/t3yPF/btsQ3XRqIo7/Nl4GRdjYQRYQuSB9hMePU0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Ft3yPF%2FbtsQ3XRqIo7%2FNl4GRdjYQRYQuSB9hMePU0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;271&quot; height=&quot;232&quot; data-origin-width=&quot;271&quot; data-origin-height=&quot;232&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 시각화
fig, axes = plt.subplots(1, 4, figsize=(20, 6))

# 박스플롯
bp = axes[0].boxplot([cond0, cond1], labels=[&quot;평일&quot;, &quot;주말&quot;], patch_artist=True)
bp[&quot;boxes&quot;][0].set_facecolor(&quot;#0063b2&quot;)
bp[&quot;boxes&quot;][1].set_facecolor(&quot;#e94b3c&quot;)
axes[0].set_title(&quot;주말 여부 매출액 차이&quot;)
axes[0].grid(axis=&quot;y&quot;, alpha=0.5)

# 히스토그램
axes[1].hist(cond0, bins=5, alpha=0.6, label=&quot;평일&quot;, color=&quot;#0063b2&quot;,
             edgecolor=&quot;black&quot;, density=True)
axes[1].hist(cond1, bins=5, alpha=0.6, label=&quot;평일&quot;, color=&quot;#e94b3c&quot;,
             edgecolor=&quot;black&quot;, density=True)
axes[1].set_xlabel(&quot;매출액&quot;)
axes[1].set_title(&quot;주말 여부 매출액 차이&quot;)
axes[1].grid(axis=&quot;y&quot;, alpha=0.5)

# Q-Q Plot
stats.probplot(cond0, dist=&quot;norm&quot;, plot=axes[2])
axes[2].set_title('Q-Q Plot (Class 0)')
axes[2].grid(True, alpha=0.3)

stats.probplot(cond1, dist=&quot;norm&quot;, plot=axes[3])
axes[3].set_title('Q-Q Plot (Class 0)')
axes[3].grid(True, alpha=0.3)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1609&quot; data-origin-height=&quot;544&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cYjEKD/btsQ2mSpL1R/uSybKjvm3Rz538BLf8kC8k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cYjEKD/btsQ2mSpL1R/uSybKjvm3Rz538BLf8kC8k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cYjEKD/btsQ2mSpL1R/uSybKjvm3Rz538BLf8kC8k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcYjEKD%2FbtsQ2mSpL1R%2FuSybKjvm3Rz538BLf8kC8k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1609&quot; height=&quot;544&quot; data-origin-width=&quot;1609&quot; data-origin-height=&quot;544&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 표본 크기에 따른 정규성 확인&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 정규성 검정&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*50)
print(&quot;가설검정 프로세스&quot;)
print(&quot;=&quot;*50)

# Step 1: 정규성 검정
is_normal_0 = check_normality_simple(cond0, &quot;평일 매출&quot;)
is_normal_1 = check_normality_simple(cond1, &quot;주말 매출&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;664&quot; data-origin-height=&quot;421&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d6LKpH/btsQ2sSl8l3/FqzPpxGlHjJyxz9cWrZul0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d6LKpH/btsQ2sSl8l3/FqzPpxGlHjJyxz9cWrZul0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d6LKpH/btsQ2sSl8l3/FqzPpxGlHjJyxz9cWrZul0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd6LKpH%2FbtsQ2sSl8l3%2FFqzPpxGlHjJyxz9cWrZul0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;664&quot; height=&quot;421&quot; data-origin-width=&quot;664&quot; data-origin-height=&quot;421&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 등분산성 검정 (독립표본만) stats.levene&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# Step 2: 등분산성 검정
print(&quot;\\n[등분산성 검정]&quot;)
print(&quot;-&quot;*40)
stat, p_levene = levene(cond0, cond1)
print(f&quot;Levene's test p-value: {p_levene:.4f}&quot;)
equal_var = p_levene &amp;gt; 0.05
print(f&quot;결과: {'✅ 등분산 가정 충족' if equal_var else '❌ 이분산 &amp;rarr; Welch t-test 사용'}&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;351&quot; data-origin-height=&quot;131&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dwlJAB/btsQ3DlfuRD/Mqxj8IQIcB9kkTwRIwkMDK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dwlJAB/btsQ3DlfuRD/Mqxj8IQIcB9kkTwRIwkMDK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dwlJAB/btsQ3DlfuRD/Mqxj8IQIcB9kkTwRIwkMDK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdwlJAB%2FbtsQ3DlfuRD%2FMqxj8IQIcB9kkTwRIwkMDK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;351&quot; height=&quot;131&quot; data-origin-width=&quot;351&quot; data-origin-height=&quot;131&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ Welch t-test ttest_ind(equal_var=False)&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# Step 3: Welch t-test 사용
print(&quot;\\n[Welch t-test]&quot;)
print(&quot;-&quot;*40)
tstat, pval = stats.ttest_ind(cond0, cond1, equal_var=False)
print(&quot;평일 vs 주말&quot;)
print(f&quot;주말-평일 매출액 : {tstat:.4f}&quot;, f&quot;\\np-value : {pval:.4f}&quot;)
print(&quot;-&quot;*40)

if pval &amp;lt; 0.05:
    print(f&quot;✅ p-value({pval:.4f}) &amp;lt; 0.05 &amp;rarr; 귀무가설 기각&quot;)
    print(f&quot;    평일 매출액과 주말 매출액은 유의한 차이가 있음&quot;)
    print(f&quot;    (통계적으로 의미있는 차이 존재)&quot;)
else:
    print(f&quot;❌ p-value({pval:.4f}) &amp;ge; 0.05 &amp;rarr; 귀무가설 채택&quot;)
    print(f&quot;    평일 매출액과 주말 매출액은 유의한 차이가 없음&quot;)
    print(f&quot;    (관측된 차이는 우연에 의한 것일 수 있음)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;466&quot; data-origin-height=&quot;247&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cPP9PX/btsQ2hcldU6/xMJf6cxKuokaC0fgmG3Krk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cPP9PX/btsQ2hcldU6/xMJf6cxKuokaC0fgmG3Krk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cPP9PX/btsQ2hcldU6/xMJf6cxKuokaC0fgmG3Krk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcPP9PX%2FbtsQ2hcldU6%2FxMJf6cxKuokaC0fgmG3Krk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;466&quot; height=&quot;247&quot; data-origin-width=&quot;466&quot; data-origin-height=&quot;247&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;5️⃣ 가설검정&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;6️⃣ 결론 도출&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# =============================================================================
# Step 4: 가설검정
# =============================================================================
print(&quot;\\n[가설검정]&quot;)
print(&quot;-&quot;*40)

# 가설 설정
print(&quot;H₀: &amp;mu;₀ = &amp;mu;₁ (평일 매출액과 주말 매출액은 유의한 차이가 없다)&quot;)
print(&quot;H₁: &amp;mu;₀ &amp;ne; &amp;mu;₁ (평일 매출액과 주말 매출액은 유의한 차이가 있다)&quot;)
print(&quot;유의수준: &amp;alpha; = 0.05&quot;)

# -----------------------------------------------------------------------------
# 4-1. 검정 방법 선택 및 실행
# -----------------------------------------------------------------------------
# 정규성 검정 결과에 따라 모수/비모수 검정 선택
if is_normal_0 and is_normal_1:
    # 모수 검정: 독립표본 t-검정 (두 그룹 모두 정규분포)
    t_stat, p_value = ttest_ind(cond0, cond1, equal_var=equal_var)
    test_name = &quot;Student's t-test&quot; if equal_var else &quot;Welch's t-test&quot;
    print(f&quot;\\n{test_name} 결과:&quot;)
    print(f&quot;t = {t_stat:.4f}, p = {p_value:.4f}&quot;)
    
    # Cohen's d 효과 크기 계산 (표준화된 평균 차이)
    # d = (평균1 - 평균2) / 합동표준편차
    pooled_std = np.sqrt((cond0.var() + cond1.var()) / 2)
    cohens_d = (cond0.mean() - cond1.mean()) / pooled_std
    abs_d = abs(cohens_d)
    
    # Cohen's d 해석 기준
    if abs_d &amp;lt; 0.2:
        effect = &quot;매우 작은 효과&quot;
    elif abs_d &amp;lt; 0.5:
        effect = &quot;작은 효과&quot;
    elif abs_d &amp;lt; 0.8:
        effect = &quot;중간 효과&quot;
    else:
        effect = &quot;큰 효과&quot;
    
    print(f&quot;Cohen's d = {cohens_d:.3f} ({effect})&quot;)

else:
    # 비모수 검정: Mann-Whitney U 검정 (정규성 가정 위반)
    # 중앙값 차이를 검정 (순위 기반)
    u_stat, p_value = mannwhitneyu(cond0, cond1, alternative='two-sided')
    print(f&quot;\\nMann-Whitney U test 결과:&quot;)
    print(f&quot;U = {u_stat:.4f}, p = {p_value:.4f}&quot;)

# -----------------------------------------------------------------------------
# 4-2. 통계적 결론 도출
# -----------------------------------------------------------------------------
print(f&quot;\\n[결론]&quot;)

# p-value를 유의수준(&amp;alpha;=0.05)과 비교하여 가설 채택/기각 결정
if p_value &amp;lt; 0.05:
    print(f&quot;✅ p-value({p_value:.4f}) &amp;lt; 0.05 &amp;rarr; 귀무가설 기각&quot;)
    print(f&quot;   두 클래스의 알코올 도수에 유의한 차이가 있음&quot;)
    print(f&quot;   (통계적으로 의미있는 차이 존재)&quot;)
else:
    print(f&quot;❌ p-value({p_value:.4f}) &amp;ge; 0.05 &amp;rarr; 귀무가설 채택&quot;)
    print(f&quot;   두 클래스의 알코올 도수에 유의한 차이가 없음&quot;)
    print(f&quot;   (관측된 차이는 우연에 의한 것일 수 있음)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;547&quot; data-origin-height=&quot;370&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/EVKxA/btsQ3XcN1OB/mURElfkByS2gJO6rnGjAY0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/EVKxA/btsQ3XcN1OB/mURElfkByS2gJO6rnGjAY0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/EVKxA/btsQ3XcN1OB/mURElfkByS2gJO6rnGjAY0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FEVKxA%2FbtsQ3XcN1OB%2FmURElfkByS2gJO6rnGjAY0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;547&quot; height=&quot;370&quot; data-origin-width=&quot;547&quot; data-origin-height=&quot;370&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  self 실습 02 : 운동 프로그램 효과 평가&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❓ 8주 운동 프로그램이 체지방률 감소에 효과가 있는가❓&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; 같은 대상의 전후 비교니까 &lt;b&gt;ttest_rel()&lt;/b&gt; 사용하기?&lt;/p&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;# 가상의 데이터 생성 (25명 참가자)
np.random.seed(123)
n_participants = 25

# 프로그램 전 체지방률 (%)
before_fat = np.random.normal(28, 5, n_participants)
before_fat = np.clip(before_fat, 15, 40)  # 현실적인 범위로 제한

# 프로그램 후 체지방률 (평균적으로 2% 감소, 개인차 존재)
reduction = np.random.gamma(2, 1, n_participants)  # 감소량은 양수
after_fat = before_fat - reduction
after_fat = np.clip(after_fat, 10, 39)  # 현실적인 범위로 제한

# DataFrame 생성
fitness_df = pd.DataFrame({
    '참가자ID': [f'ID{i:03d}' for i in range(1, n_participants+1)],
    '운동전_체지방률': before_fat,
    '운동후_체지방률': after_fat,
    '체지방_감소량': before_fat - after_fat
})

print(&quot;=&quot;*60)
print(&quot;실습 문제 2: 운동 프로그램 효과 평가&quot;)
print(&quot;=&quot;*60)
print(f&quot;\\n참가자 수: {n_participants}명&quot;)
print(&quot;\\n[데이터 샘플]&quot;)
display(fitness_df.head())
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;795&quot; data-origin-height=&quot;411&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/eopTso/btsQ3qT2rbm/fUDcYK3dKOO8wbchUNfFnk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eopTso/btsQ3qT2rbm/fUDcYK3dKOO8wbchUNfFnk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eopTso/btsQ3qT2rbm/fUDcYK3dKOO8wbchUNfFnk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeopTso%2FbtsQ3qT2rbm%2FfUDcYK3dKOO8wbchUNfFnk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;795&quot; height=&quot;411&quot; data-origin-width=&quot;795&quot; data-origin-height=&quot;411&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 시각화
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# Before-After 연결선 그래프
for i in range(len(fitness_df)):
    axes[0].plot([0, 1], [fitness_df.iloc[i][&quot;운동전_체지방률&quot;], fitness_df.iloc[i][&quot;운동후_체지방률&quot;]], 
                'gray', alpha=0.4, linewidth=0.8)
axes[0].plot([0, 1], [fitness_df[&quot;운동전_체지방률&quot;].mean(), fitness_df[&quot;운동후_체지방률&quot;].mean()], 
            'red', linewidth=3, marker='o', markersize=8, label='평균')
axes[0].set_xticks([0, 1])
axes[0].set_xticklabels([&quot;운동 전&quot;, &quot;운동 후&quot;])
axes[0].set_ylabel(&quot;체지방률&quot;)
axes[0].set_title('개인별 변화')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# 박스플롯
bp = axes[1].boxplot([fitness_df[&quot;운동전_체지방률&quot;], fitness_df[&quot;운동후_체지방률&quot;]], 
                     labels=[&quot;운동 전&quot;, &quot;운동 후&quot;], patch_artist=True)
bp['boxes'][0].set_facecolor(&quot;#ead98b&quot;)
bp['boxes'][1].set_facecolor(&quot;#7dd0b6&quot;)
axes[1].set_ylabel(&quot;체지방률&quot;)
axes[1].set_title(&quot;체지방률 변화&quot;)
axes[1].grid(True, alpha=0.3)

# 변화량 히스토그램
axes[2].hist(fitness_df[&quot;체지방_감소량&quot;], bins=10, edgecolor='black', alpha=0.7, color=&quot;#93c763&quot;)
axes[2].axvline(0, color='red', linestyle='--', linewidth=2, label='변화 없음')
axes[2].axvline(fitness_df[&quot;체지방_감소량&quot;].mean(), color='blue', linestyle='--', 
               linewidth=2, label=f'평균: {fitness_df[&quot;체지방_감소량&quot;].mean():.1f}')
axes[2].set_xlabel(&quot;체지방 감소량&quot;)
axes[2].set_ylabel(&quot;빈도&quot;)
axes[2].set_title(&quot;체지방 감소량 분포&quot;)
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1389&quot; data-origin-height=&quot;490&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xasBe/btsQ38kZjeR/nI4OJumlYOaJwCDLjHiwk1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xasBe/btsQ38kZjeR/nI4OJumlYOaJwCDLjHiwk1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xasBe/btsQ38kZjeR/nI4OJumlYOaJwCDLjHiwk1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxasBe%2FbtsQ38kZjeR%2FnI4OJumlYOaJwCDLjHiwk1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1389&quot; height=&quot;490&quot; data-origin-width=&quot;1389&quot; data-origin-height=&quot;490&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 표본 크기에 따른 정규성 확인&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 정규성 검정&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*50)
print(&quot;가설검정 프로세스&quot;)
print(&quot;=&quot;*50)

# Step 1: 변화량의 정규성 검정
is_normal_diff = check_normality_simple(fitness_df[&quot;체지방_감소량&quot;], &quot;체지방_감소량&quot;)

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;689&quot; data-origin-height=&quot;264&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dkB1Gh/btsQ3ImxnKl/0ZEfhrHOLZkRV5S4LphbTK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dkB1Gh/btsQ3ImxnKl/0ZEfhrHOLZkRV5S4LphbTK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dkB1Gh/btsQ3ImxnKl/0ZEfhrHOLZkRV5S4LphbTK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdkB1Gh%2FbtsQ3ImxnKl%2F0ZEfhrHOLZkRV5S4LphbTK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;689&quot; height=&quot;264&quot; data-origin-width=&quot;689&quot; data-origin-height=&quot;264&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; 비모수검정 - 윌콕슨 사용해야 함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 가설검정&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 윌콕슨&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;5️⃣ 결론 도출&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# =============================================================================
# Step 2: 가설검정
# =============================================================================
print(&quot;\\n[가설검정]&quot;)
print(&quot;-&quot;*40)

# 가설 설정 (대응표본 검정)
print(&quot;H₀: &amp;mu;_before = &amp;mu;_after (운동 효과 없음)&quot;)
print(&quot;H₁: &amp;mu;_before &amp;ne; &amp;mu;_after (운동 효과 있음)&quot;)
print(&quot;유의수준: &amp;alpha; = 0.05&quot;)
alpha = 0.05
# -----------------------------------------------------------------------------
# 2-1. 검정 방법 선택 및 실행
# -----------------------------------------------------------------------------
# 차이값의 정규성에 따라 모수/비모수 검정 선택
if is_normal_diff:
    # 모수 검정: 대응표본 t-검정 (차이값이 정규분포)
    # 동일 대상의 전후 비교이므로 paired t-test 사용
    t_stat, p_value = ttest_rel(fitness_df[&quot;운동전_체지방률&quot;], fitness_df[&quot;운동후_체지방률&quot;])
    print(f&quot;\\nPaired t-test 결과:&quot;)
    print(f&quot;t = {t_stat:.4f}, p = {p_value:.4f}&quot;)
    
    # -------------------------------------------------------------------------
    # 2-2. 효과 크기 계산 (Cohen's d for paired samples)
    # -------------------------------------------------------------------------
    # 대응표본의 Cohen's d = 평균 변화량 / 변화량의 표준편차
    cohens_d = fitness_df[&quot;체지방_감소량&quot;].mean() / fitness_df[&quot;체지방_감소량&quot;].std()
    abs_d = abs(cohens_d)
    
    # Cohen's d 해석 기준 (대응표본)
    if abs_d &amp;lt; 0.2:
        effect = &quot;매우 작은 효과&quot;
    elif abs_d &amp;lt; 0.5:
        effect = &quot;작은 효과&quot;  
    elif abs_d &amp;lt; 0.8:
        effect = &quot;중간 효과&quot;
    else:
        effect = &quot;큰 효과&quot;
    
    print(f&quot;Cohen's d = {cohens_d:.3f} ({effect})&quot;)
    
    # -------------------------------------------------------------------------
    # 2-3. 신뢰구간 계산
    # -------------------------------------------------------------------------
    # 평균 변화량의 95% 신뢰구간 추정
    # CI = 평균 &amp;plusmn; t(&amp;alpha;/2, df) &amp;times; SE
    confidence = 0.95  # 신뢰수준
    n = len(treatment_df)  # 표본 크기
    mean_diff = fitness_df[&quot;체지방_감소량&quot;].mean()  # 평균 변화량
    se_diff = stats.sem(fitness_df[&quot;체지방_감소량&quot;])  # 표준오차
    
    # t-분포 기반 신뢰구간 (자유도 = n-1)
    ci = stats.t.interval(confidence, n-1, loc=mean_diff, scale=se_diff)
    print(f&quot;평균 변화의 95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]&quot;)
    
else:
# Step 2: 비모수검정 - 윌콕슨 (정규성 가정 위반)
    # 중앙값 차이를 검정 (순위와 부호 기반)
    w_stat, p_value = wilcoxon(fitness_df[&quot;운동전_체지방률&quot;], fitness_df[&quot;운동후_체지방률&quot;])
    print(f&quot;\\nWilcoxon signed-rank test 결과 :&quot;)
    print(f&quot;W = {w_stat:.7f}, p = {p_value:.7f}&quot;)

# -----------------------------------------------------------------------------
# 2-4. 통계적 결론 도출
# -----------------------------------------------------------------------------
print(f&quot;\\n[결론]&quot;)

# p-value를 유의수준(&amp;alpha;=0.05)과 비교하여 가설 채택/기각 결정
if p_value &amp;lt; alpha:
    print(f&quot;✅ p-value({p_value:.4f}) &amp;lt; 0.05 &amp;rarr; 귀무가설 기각&quot;)
    print(f&quot;   운동 효과가 있음 (평균 {abs(fitness_df['체지방_감소량'].mean()):.1f} 감소)&quot;)
    print(f&quot;   (통계적으로 유의한 개선 효과)&quot;)
else:
    print(f&quot;❌ p-value({p_value:.4f}) &amp;ge; 0.05 &amp;rarr; 귀무가설 채택&quot;)
    print(f&quot;   운동 유의하지 않음&quot;)
    print(f&quot;   (관측된 변화는 우연에 의한 것일 수 있음)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;392&quot; data-origin-height=&quot;339&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/UvGFl/btsQ1vh4XQ3/tb8lUp3Dzq4D7BU3YETJU0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/UvGFl/btsQ1vh4XQ3/tb8lUp3Dzq4D7BU3YETJU0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/UvGFl/btsQ1vh4XQ3/tb8lUp3Dzq4D7BU3YETJU0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FUvGFl%2FbtsQ1vh4XQ3%2Ftb8lUp3Dzq4D7BU3YETJU0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;392&quot; height=&quot;339&quot; data-origin-width=&quot;392&quot; data-origin-height=&quot;339&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  self 실습 03 : 제품 품질 검사&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❓ 생산된 음료의 실제 용량이 표기 용량(500ml)과 일치하는가❓&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; 한 그룹과 기준값의 비교니까 ttest_1samp() 사용하기 ?&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 가상의 데이터 생성 (40개 샘플)
np.random.seed(456)
n_samples = 40

# 실제 용량 (평균 498ml, 표준편차 3ml로 약간 부족한 상황 시뮬레이션)
actual_volume = np.random.normal(498, 3, n_samples)

# DataFrame 생성
quality_df = pd.DataFrame({
    '샘플번호': [f'S{i:03d}' for i in range(1, n_samples+1)],
    '실제용량': actual_volume,
    '표기용량과의_차이': actual_volume - 500
})

print(&quot;=&quot;*60)
print(&quot;실습 문제 3: 제품 품질 검사&quot;)
print(&quot;=&quot;*60)
print(f&quot;\\n샘플 수: {n_samples}개&quot;)
print(f&quot;표기 용량: 500ml&quot;)
print(&quot;\\n[데이터 정보]&quot;)
print(f&quot;실제 용량 평균: {actual_volume.mean():.2f}ml&quot;)
print(f&quot;실제 용량 표준편차: {actual_volume.std():.2f}ml&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;787&quot; data-origin-height=&quot;258&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/mx5BV/btsQ3EYLheH/aokjH8V7jmJWVxZ1HKYFC1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/mx5BV/btsQ3EYLheH/aokjH8V7jmJWVxZ1HKYFC1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/mx5BV/btsQ3EYLheH/aokjH8V7jmJWVxZ1HKYFC1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fmx5BV%2FbtsQ3EYLheH%2FaokjH8V7jmJWVxZ1HKYFC1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;787&quot; height=&quot;258&quot; data-origin-width=&quot;787&quot; data-origin-height=&quot;258&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;nginx&quot;&gt;&lt;code&gt;quality_df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;430&quot; data-origin-height=&quot;418&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/va4Eb/btsQ2RxyLkT/RiltJnDGN0CMZaCZ64VlK1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/va4Eb/btsQ2RxyLkT/RiltJnDGN0CMZaCZ64VlK1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/va4Eb/btsQ2RxyLkT/RiltJnDGN0CMZaCZ64VlK1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fva4Eb%2FbtsQ2RxyLkT%2FRiltJnDGN0CMZaCZ64VlK1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;430&quot; height=&quot;418&quot; data-origin-width=&quot;430&quot; data-origin-height=&quot;418&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;prolog&quot;&gt;&lt;code&gt;# 시각화
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
target_value = 500

# 히스토그램
axes[0].hist(quality_df[&quot;실제용량&quot;], bins=12, edgecolor='black', alpha=0.7, color=&quot;#aed6df&quot;)
axes[0].axvline(target_value, color='red', linestyle='--', linewidth=2, label=f'기준값: {target_value}cm')
axes[0].axvline(quality_df[&quot;실제용량&quot;].mean(), color='green', linestyle='--', linewidth=2, 
                label=f'평균: {quality_df[&quot;실제용량&quot;].mean():.2f}cm')
axes[0].set_xlabel('음료 실제 용량 (ml)')
axes[0].set_ylabel('빈도')
axes[0].set_title('음료 실제 용량')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# 박스플롯
bp = axes[1].boxplot(quality_df[&quot;실제용량&quot;], patch_artist=True)
bp['boxes'][0].set_facecolor(&quot;#aed6df&quot;)
axes[1].axhline(target_value, color='red', linestyle='--', linewidth=2)
axes[1].set_ylabel('음료 실제 용량 (ml)')
axes[1].set_title('박스플롯')
axes[1].text(1.1, target_value+0.05, f'기준값: {target_value}', color='red')
axes[1].grid(True, alpha=0.3)

# Q-Q plot
stats.probplot(quality_df[&quot;실제용량&quot;], dist=&quot;norm&quot;, plot=axes[2])
axes[2].set_title('Q-Q Plot')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1389&quot; data-origin-height=&quot;490&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cslA8R/btsQ2zw1Rqs/R5Jl1QKT4S29njjIo4l9ck/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cslA8R/btsQ2zw1Rqs/R5Jl1QKT4S29njjIo4l9ck/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cslA8R/btsQ2zw1Rqs/R5Jl1QKT4S29njjIo4l9ck/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcslA8R%2FbtsQ2zw1Rqs%2FR5Jl1QKT4S29njjIo4l9ck%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1389&quot; height=&quot;490&quot; data-origin-width=&quot;1389&quot; data-origin-height=&quot;490&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 표본 크기에 따른 정규성 확인&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 정규성 검정&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;print(&quot;\\n&quot; + &quot;=&quot;*50)
print(&quot;가설검정 프로세스&quot;)
print(&quot;=&quot;*50)

# Step 1: 정규성 검정
is_normal = check_normality_simple(quality_df[&quot;실제용량&quot;], &quot;음료 실제 용량&quot;)

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;660&quot; data-origin-height=&quot;248&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cR4zoB/btsQ3j8qvHC/CosgW20aGRO4fdAvXr5MRk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cR4zoB/btsQ3j8qvHC/CosgW20aGRO4fdAvXr5MRk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cR4zoB/btsQ3j8qvHC/CosgW20aGRO4fdAvXr5MRk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcR4zoB%2FbtsQ3j8qvHC%2FCosgW20aGRO4fdAvXr5MRk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;660&quot; height=&quot;248&quot; data-origin-width=&quot;660&quot; data-origin-height=&quot;248&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 가설검정&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 결론 도출&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;python&quot;&gt;&lt;code&gt;# =============================================================================
# Step 2: 가설검정
# =============================================================================
print(&quot;\\n[가설검정]&quot;)
print(&quot;-&quot;*40)

# 가설 설정 (단일표본 검정)
# 표본 평균이 특정 값(target_value)과 같은지 검정
print(f&quot;H₀: &amp;mu; = {target_value}ml (평균이 {target_value}ml)&quot;)
print(f&quot;H₁: &amp;mu; &amp;ne; {target_value}ml (평균이 {target_value}ml가 아님)&quot;)
print(&quot;유의수준: &amp;alpha; = 0.05&quot;)

# -----------------------------------------------------------------------------
# 2-1. 검정 방법 선택 및 실행
# -----------------------------------------------------------------------------
# 정규성 검정 결과에 따라 모수/비모수 검정 선택
if is_normal:
    # 모수 검정: 단일표본 t-검정 (데이터가 정규분포)
    # 표본 평균과 모집단 평균(target_value)을 비교
    t_stat, p_value = ttest_1samp(quality_df[&quot;실제용량&quot;], target_value)
    print(f&quot;\\nOne-sample t-test 결과:&quot;)
    print(f&quot;t = {t_stat:.4f}, p = {p_value:.4f}&quot;)
    
    # -------------------------------------------------------------------------
    # 2-2. 신뢰구간 계산 및 해석
    # -------------------------------------------------------------------------
    # 모평균의 95% 신뢰구간 추정
    # CI = 표본평균 &amp;plusmn; t(&amp;alpha;/2, df) &amp;times; SE
    confidence = 0.95  # 신뢰수준
    n = len(quality_df[&quot;실제용량&quot;])  # 표본 크기
    mean = quality_df[&quot;실제용량&quot;].mean()  # 표본 평균
    se = stats.sem(quality_df[&quot;실제용량&quot;])  # 표준오차 (SE = s/&amp;radic;n)
    
    # t-분포 기반 신뢰구간 (자유도 = n-1)
    ci = stats.t.interval(confidence, n-1, loc=mean, scale=se)
    print(f&quot;평균의 95% CI: [{ci[0]:.3f}, {ci[1]:.3f}]cm&quot;)
    
    # 신뢰구간과 목표값 비교
    # 목표값이 신뢰구간 내에 있으면 H₀를 기각할 수 없음
    if ci[0] &amp;lt;= target_value &amp;lt;= ci[1]:
        print(f&quot;&amp;rarr; {target_value}ml가 신뢰구간 내에 있음&quot;)
    else:
        print(f&quot;&amp;rarr; {target_value}ml가 신뢰구간 밖에 있음&quot;)
        
else:
    # 비모수 검정: Wilcoxon 부호순위 검정 (정규성 가정 위반)
    # 중앙값이 목표값과 같은지 검정 (순위와 부호 기반)
    
    # 각 관측값과 목표값의 차이 계산
    differences = quality_df[&quot;실제용량&quot;] - target_value
    
    # Wilcoxon 부호순위 검정 실행
    # 차이의 절댓값에 순위를 매기고, 부호를 고려하여 검정
    w_stat, p_value = wilcoxon(differences)
    print(f&quot;\\nWilcoxon signed-rank test 결과:&quot;)
    print(f&quot;W = {w_stat:.4f}, p = {p_value:.4f}&quot;)

# -----------------------------------------------------------------------------
# 2-3. 통계적 결론 도출
# -----------------------------------------------------------------------------
print(f&quot;\\n[결론]&quot;)

# p-value를 유의수준(&amp;alpha;=0.05)과 비교하여 가설 채택/기각 결정
if p_value &amp;lt; 0.05:
    print(f&quot;✅ p-value({p_value:.4f}) &amp;lt; 0.05 &amp;rarr; 귀무가설 기각&quot;)
    
    # 차이의 방향 확인 (평균이 목표값보다 높은지 낮은지)
    diff = quality_df[&quot;실제용량&quot;].mean() - target_value
    if diff &amp;gt; 0:
        print(f&quot;   평균({quality_df['실제용량'].mean():.3f}cm)이 {target_value}cm보다 유의하게 높음&quot;)
    else:
        print(f&quot;   평균({quality_df['실제용량'].mean():.3f}cm)이 {target_value}cm보다 유의하게 낮음&quot;)
    print(f&quot;   (통계적으로 의미있는 차이 존재)&quot;)
    
else:
    print(f&quot;❌ p-value({p_value:.4f}) &amp;ge; 0.05 &amp;rarr; 귀무가설 채택&quot;)
    print(f&quot;   평균이 {target_value}cm와 유의한 차이가 없음&quot;)
    print(f&quot;   (관측된 차이는 우연에 의한 것일 수 있음)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;427&quot; data-origin-height=&quot;397&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b2FSWW/btsQ2AQfAK2/Io7wSKnVGZdG4tyGujxDKk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b2FSWW/btsQ2AQfAK2/Io7wSKnVGZdG4tyGujxDKk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b2FSWW/btsQ2AQfAK2/Io7wSKnVGZdG4tyGujxDKk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb2FSWW%2FbtsQ2AQfAK2%2FIo7wSKnVGZdG4tyGujxDKk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;427&quot; height=&quot;397&quot; data-origin-width=&quot;427&quot; data-origin-height=&quot;397&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>Sparta/Theory</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/72</guid>
      <comments>https://junecho.tistory.com/72#entry72comment</comments>
      <pubDate>Thu, 2 Oct 2025 23:38:55 +0900</pubDate>
    </item>
    <item>
      <title>[251001] 머신러닝 04 - 분류</title>
      <link>https://junecho.tistory.com/71</link>
      <description>&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;✅ 분류 (Classification)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;데이터가 어느 범주(클래스)에 속하는지 예측&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex) 스팸 메일 분류(스팸/정상), 질병 여부(양성/음성), 제조 공정 품질(불량/정상) 등&amp;hellip;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;분류 모델 사용 이유 : 이진 분류(양성/음성, 합격/불합격/ 정상/불량 등)는 직관적&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ 지도학습&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;입력 데이터(Feature 특징)와 정답(Label)이 주어졌을 때, 모델이 정답을 예측하도록 학습하는 방식&lt;/li&gt;
&lt;li&gt;회귀 - 연속값 예측&lt;/li&gt;
&lt;li&gt;분류 - 범주 예측&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  로지스틱 회귀 (Logistic Regression)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;783&quot; data-origin-height=&quot;541&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/AjguU/btsQ0F4v570/CLDOOsLJTqF9P71HEsLPD0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/AjguU/btsQ0F4v570/CLDOOsLJTqF9P71HEsLPD0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/AjguU/btsQ0F4v570/CLDOOsLJTqF9P71HEsLPD0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FAjguU%2FbtsQ0F4v570%2FCLDOOsLJTqF9P71HEsLPD0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;577&quot; height=&quot;399&quot; data-origin-width=&quot;783&quot; data-origin-height=&quot;541&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;선형 회귀처럼 입력값의 선형 결합을 취하지만, 결과를 0~1 사이의 확률로 변환하기 위해 로지스틱 함수(시그모이드 함수)를 사용&lt;/li&gt;
&lt;li&gt;&lt;b&gt;장점&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;계산이 빠르고 구현이 간단&lt;/li&gt;
&lt;li&gt;결과 해석이 용이 (회귀 계수로 각 변수의 영향도 해석 가능)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;단점&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;복잡한 비선형 패턴을 학습하기엔 한계가 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;  로지스틱 회귀 코드&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;/h3&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt;  LogisticRegression()&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# 1. 데이터 로드
iris = load_iris()
X = iris.data       # 특징(feature) 데이터
y = iris.target     # 타깃(target) 데이터

# 2. 데이터 분할 (train : test = 8 : 2)
# stratify=y : 클래스 비율을 train, test가 유사하게끔 맞춤
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=42,
                                                    stratify=y)

# 3. Logistic Regression
logistic_model = LogisticRegression(max_iter=200)  
logistic_model.fit(X_train, y_train)

# 4. 예측
y_pred_logistic = logistic_model.predict(X_test)

# 5. 성능 평가

print(&quot;=== Logistic Regression ===&quot;)
print(&quot;Accuracy:&quot;, accuracy_score(y_test, y_pred_logistic))
print(classification_report(y_test, y_pred_logistic, target_names=iris.target_names))

&lt;/code&gt;&lt;/pre&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;LogisticRegression()&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;학습 데이터(X_train, y_train)를 이용해 로지스틱 회귀 모델을 학습&lt;/li&gt;
&lt;li&gt;max_iter(최대 반복 횟수)를 기본값(100)에서 조금 늘려 200으로 설정&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;accuracy_score로 정확도(Accuracy) 계산&lt;/li&gt;
&lt;li&gt;classification_report로 클래스별 정밀도(precision), 재현율(recall), F1 점수, 지원된 샘플 수(support) 등을 확인&lt;/li&gt;
&lt;li&gt;target_names=iris.target_names를 통해 각 클래스의 이름(&amp;lsquo;setosa&amp;rsquo;, &amp;lsquo;versicolor&amp;rsquo;, &amp;lsquo;virginica&amp;rsquo;)으로 보고서를 보기 쉽게 표시&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  SVM (Support Vector Machine)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;데이터를 &lt;b&gt;가장 잘&lt;/b&gt;(안전 여유공간을 크게) 구분하는 &lt;b&gt;경계&lt;/b&gt;를 찾는 알고리즘&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex) 두 부류(ex: 고양이 vs 개)를 &lt;b&gt;잘 구분&lt;/b&gt;해주는 경계를 찾는데, 두 부류가 &lt;b&gt;최대한 멀리 떨어지도록&lt;/b&gt;(안전 여유공간이 넓도록) 찾는 방식&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;장점&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;차원이 높은 데이터에서도 좋은 성능을 보일 수 있음&lt;/li&gt;
&lt;li&gt;결정 경계를 명확하게 찾는 경우, 예측 성능이 우수함
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;결정경계란? &amp;rarr; SVM이 찾은 &lt;b&gt;최적의 분류선&lt;/b&gt;(또는 초평면)&lt;/li&gt;
&lt;li&gt;ex) 한쪽 편을 '고양이'로, 다른 한편을 '개'로 구분해주는 기준선&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;단점&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;파라미터(C, 커널 종류 등)를 적절히 찾아야 하므로 튜닝 비용이 큼&lt;/li&gt;
&lt;li&gt;대규모 데이터 세트에 대해서는 학습 속도가 느릴 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;  SVM 코드&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;/h3&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt;&lt;b&gt;  SVC()&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

# 1. 데이터 로드
iris = load_iris()
X = iris.data       # 특징(feature) 데이터
y = iris.target     # 타깃(target) 데이터
print(X.shape)
print(y.shape)

# 2. 데이터 분할 (train : test = 8 : 2)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=42,
                                                    stratify=y)

# 3. SVM(Support Vector Machine)
# C, gamma 등의 하이퍼파라미터를 설정해서 더 최적화할 수도 있습니다.
svm_model = SVC()
svm_model.fit(X_train, y_train)

# 4. 예측
y_pred_svm = svm_model.predict(X_test)

# 5. 성능 평가
# Accuracy(정확도)와 정밀 평가(classification_report)를 이용해 비교해봅니다.

print(&quot;=== SVM ===&quot;)
print(&quot;Accuracy:&quot;, accuracy_score(y_test, y_pred_svm))
print(classification_report(y_test, y_pred_svm, target_names=iris.target_names))
&lt;/code&gt;&lt;/pre&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;SVM&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;SVC()는 기본적으로 커널(kernel)을 &amp;lsquo;rbf&amp;rsquo;로 사용&lt;/li&gt;
&lt;li&gt;다른 하이퍼파라미터(C, gamma 등)를 조정해서 성능 개선을 시도할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;accuracy_score로 정확도(Accuracy) 계산&lt;/li&gt;
&lt;li&gt;classification_report로 클래스별 정밀도(precision), 재현율(recall), F1 점수, 지원된 샘플 수(support) 등을 확인&lt;/li&gt;
&lt;li&gt;target_names=iris.target_names를 통해 각 클래스의 이름(&amp;lsquo;setosa&amp;rsquo;, &amp;lsquo;versicolor&amp;rsquo;, &amp;lsquo;virginica&amp;rsquo;)으로 보고서를 보기 쉽게 표시&lt;/li&gt;
&lt;/ul&gt;</description>
      <category>Sparta/Theory</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/71</guid>
      <comments>https://junecho.tistory.com/71#entry71comment</comments>
      <pubDate>Wed, 1 Oct 2025 23:46:12 +0900</pubDate>
    </item>
    <item>
      <title>[250930] 머신러닝 03 - 회귀</title>
      <link>https://junecho.tistory.com/70</link>
      <description>&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;✅ 회귀 분석&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;종속변수(Y)와 하나 이상의 독립변수(X) 간의 관계를 추정하여, 연속형 종속변수를 예측하는 통계/머신러닝 기법&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex)&amp;nbsp;&amp;ldquo;공부한&amp;nbsp;시간(X)에&amp;nbsp;따라&amp;nbsp;시험&amp;nbsp;점수(Y)가&amp;nbsp;어떻게&amp;nbsp;변하는가?&amp;rdquo;&amp;nbsp;를&amp;nbsp;예측&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;  &lt;b&gt;개요&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;지도학습&lt;/b&gt;에서의 &lt;b&gt;분류와 회귀의 차이&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;분류 (Classification)&lt;/b&gt; : 결과값이 &lt;b&gt;이산형&lt;/b&gt;(클래스 라벨)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;회귀 (Regression)&lt;/b&gt; : 결과값이 &lt;b&gt;연속형&lt;/b&gt;(숫자 값)&lt;/li&gt;
&lt;li&gt;사람의 지능적인 작업을 기계가 수행하도록 만드는 광범위한 개념&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;회귀 모델을 &lt;b&gt;사용하는 이유&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;1️⃣ 미래 값 예측&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;: 실수값 예측에 사용 ex) 매량, 주가, 온도 등&amp;hellip;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;2️⃣ 인과 관계 해석&lt;/b&gt; (통계 관점) : 특정 독립변수가 종속변수에 미치는 영향력을 해석하기 위해&lt;/li&gt;
&lt;li&gt;&lt;b&gt;3️⃣ 데이터 기반 의사결정&lt;/b&gt; : 추세 파악, 자원 배분 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;회귀 모델 대표적 활용 사례
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;경제 - 주식 가격 예측, 판매량 예측&lt;/li&gt;
&lt;li&gt;건강 - 혈압, 콜레스테롤 수치 예측&lt;/li&gt;
&lt;li&gt;제조업 - 불량률, 생산량 예측&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  선형 회귀 (Linear Regression)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;독립변수&lt;/b&gt;(X)와 &lt;b&gt;종속변수&lt;/b&gt;(Y)가 &lt;b&gt;선형적&lt;/b&gt;(일차 방정식 형태)으로 관계를 맺고 있다고 가정&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ 선형적 관계&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;: &lt;b&gt;변수가 증가하면, 다른 변수도 일정한 비율로 증가/감소 하는 관계&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex) 키가 커지면 몸무게도 증가하는 경향 / 공부 시간을 늘리면 시험 점수가 오르는 경향&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;장점&lt;/b&gt; : 해석 간단, 구현 쉬움&lt;/li&gt;
&lt;li&gt;&lt;b&gt;단점&lt;/b&gt; : 데이터가 선형성이 아닐 경우 예측력이 떨어짐&lt;/li&gt;
&lt;li&gt;독립변수 한 개인 상황에서는 직선이 나오는데, 독립변수가 많으면 평면이 나오게 됨&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt; 회귀식 &lt;/b&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;517&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xg4ql/btsQXYb1Gbc/MN3ELkRRScjV2wUtdxZk41/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xg4ql/btsQXYb1Gbc/MN3ELkRRScjV2wUtdxZk41/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xg4ql/btsQXYb1Gbc/MN3ELkRRScjV2wUtdxZk41/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fxg4ql%2FbtsQXYb1Gbc%2FMN3ELkRRScjV2wUtdxZk41%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;517&quot; height=&quot;57&quot; data-origin-width=&quot;517&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&amp;beta;0 : 절편(intercept) = 편향&lt;/li&gt;
&lt;li&gt;&amp;beta;i : 각 독립변수의 회귀계수(coefficient) = x의 계수 = 가중치 = 파라미터 = 베타&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; 독립변수가 많아질수록 항이 늘어난다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;선형 회귀 모델 학습 과정&lt;/b&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1️⃣ 가중치 (회귀계수) 초기화 &amp;rArr; 베타값을 처음엔 모르니까 아무 값이나 해서 정해둠&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2️⃣ 손실함수(Loss Function) 설정 : 주로 MSE(Mean Squared Error) 사용&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;❓ 손실함수 : 얼마나 오차가 나는지, 손실이 나는지 평가 해주는 지표&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;rArr; 좋은 베타값은 오차가 적게 나오는 베타값&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3️⃣ 최적화 : 수학적인 방법(최소자승법), 경사하강법(Gradient Descent) 등을 통해 가중치 업데이트&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4️⃣ 학습 완료 후 : &amp;beta;0, &amp;beta;1, &amp;hellip;를 얻어서 새로운 입력 값에 대한 예측 수행&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;예시
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;데이터
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;X = 공부 시간, Y = 시험 점수&lt;/li&gt;
&lt;li&gt;(1시간, 40점), (2시간, 50점), (3시간, 60점), (4시간, 70점) &amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;모델&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;291&quot; data-origin-height=&quot;62&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bwXsDN/btsQVxAqoVd/30fkq2vIYgDOqlXvXToGvk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bwXsDN/btsQVxAqoVd/30fkq2vIYgDOqlXvXToGvk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bwXsDN/btsQVxAqoVd/30fkq2vIYgDOqlXvXToGvk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbwXsDN%2FbtsQVxAqoVd%2F30fkq2vIYgDOqlXvXToGvk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;291&quot; height=&quot;62&quot; data-origin-width=&quot;291&quot; data-origin-height=&quot;62&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1시간 공부 &amp;rarr; 40점, 2시간 공부 &amp;rarr; 50점 &amp;hellip;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  선형회귀 코드&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;haskell&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 1. 데이터 로드
diabetes = load_diabetes()
X = diabetes.data       # X : 특성(독립변수)
y = diabetes.target     # y : 타겟(종속변수)

print(X.shape)
print(y.shape
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;95&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nvq0e/btsQVrUuF7Z/VoPRwVpGRVcF61yzgkCQD1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nvq0e/btsQVrUuF7Z/VoPRwVpGRVcF61yzgkCQD1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nvq0e/btsQVrUuF7Z/VoPRwVpGRVcF61yzgkCQD1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fnvq0e%2FbtsQVrUuF7Z%2FVoPRwVpGRVcF61yzgkCQD1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;95&quot; height=&quot;57&quot; data-origin-width=&quot;95&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  train_test_split()&lt;/b&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 2. 학습/테스트 데이터 분리
# 80% 학습용, 20% 테스트용으로 데이터 분할 (재현성을 위한 random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;92&quot; data-origin-height=&quot;100&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/beFJsR/btsQYJrYPFN/635HOeSRyDViKeHH8fOtHk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/beFJsR/btsQYJrYPFN/635HOeSRyDViKeHH8fOtHk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/beFJsR/btsQYJrYPFN/635HOeSRyDViKeHH8fOtHk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbeFJsR%2FbtsQYJrYPFN%2F635HOeSRyDViKeHH8fOtHk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;92&quot; height=&quot;100&quot; data-origin-width=&quot;92&quot; data-origin-height=&quot;100&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;: 전체 데이터셋을 학습용 세트와 테스트용 세트로 분할하는 데 사용&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  LinearRegression()&lt;/b&gt;&lt;/h3&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;# 3. 선형회귀 (LinearRegression) 모델
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;275&quot; data-origin-height=&quot;265&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nrZFr/btsQYtCTiHk/KJtcNwH8McAXEdK3zxgjs0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nrZFr/btsQYtCTiHk/KJtcNwH8McAXEdK3zxgjs0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nrZFr/btsQYtCTiHk/KJtcNwH8McAXEdK3zxgjs0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnrZFr%2FbtsQYtCTiHk%2FKJtcNwH8McAXEdK3zxgjs0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;275&quot; height=&quot;265&quot; data-origin-width=&quot;275&quot; data-origin-height=&quot;265&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;:&lt;/b&gt; 종속 변수(대상)와 하나 이상의 독립 변수(특징) 간의 선형 관계를 구하는 것을 목표로 함&lt;/li&gt;
&lt;li&gt;fit() : 모델을 학습하기 위해 객체에 호출되는 메서드&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  predict()&lt;/b&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 예측
y_pred_lin = lin_reg.predict(X_test)

# 성능 측정
mse_lin = mean_squared_error(y_test, y_pred_lin)
r2_lin = r2_score(y_test, y_pred_lin)

# 평균 비율 오차 - 실제값 대비 예측값이 몇 % 오차가 났는지
def MPE(y_true, y_pred):
    return np.mean((y_true - y_pred) / y_true) * 100

print(&quot;[LinearRegression 결과]&quot;)
print(&quot;가중치(coefficient):&quot;, lin_reg.coef_)
print(&quot;절편(intercept):&quot;, lin_reg.intercept_)
print(&quot;MSE:&quot;, mse_lin)
print(&quot;R2 점수:&quot;, r2_lin)
print(&quot;평균 비율 오차 : &quot;, MPE(y_test, y_pred_lin))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;807&quot; data-origin-height=&quot;184&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/kGPph/btsQVw9jNEZ/hbOBrwpW3NALxy3oHuztFk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/kGPph/btsQVw9jNEZ/hbOBrwpW3NALxy3oHuztFk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/kGPph/btsQVw9jNEZ/hbOBrwpW3NALxy3oHuztFk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FkGPph%2FbtsQVw9jNEZ%2FhbOBrwpW3NALxy3oHuztFk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;807&quot; height=&quot;184&quot; data-origin-width=&quot;807&quot; data-origin-height=&quot;184&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;predict()&lt;/b&gt; : 테스트 세트에 대한 예측 수행&lt;/li&gt;
&lt;li&gt;&lt;b&gt;mean_squared_error(x, y)&lt;/b&gt; : MSE 오차를 계산하는 어떤 지표. x, y 두 개의 값의 오차 계산&lt;/li&gt;
&lt;li&gt;&lt;b&gt;r2_score(x, y) : 얼마나 예측이 잘 되어져 있는지 점수로 나타내는 값&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;10열이 있었기 때문에 가중치는 10개가 나옴.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;절편은 베타값을 처음에는 모르니까 임의로 지정해주는 상수값&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  SGDRegressor()&lt;/b&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 4. SGDRegressor 모델
sgd_reg = SGDRegressor(max_iter=6000, tol=1e-3, random_state=42)
sgd_reg.fit(X_train, y_train)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;418&quot; data-origin-height=&quot;717&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nr2rL/btsQXajRs2Z/7McVCfrxTVm6zM7N32Wp20/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nr2rL/btsQXajRs2Z/7McVCfrxTVm6zM7N32Wp20/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nr2rL/btsQXajRs2Z/7McVCfrxTVm6zM7N32Wp20/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fnr2rL%2FbtsQXajRs2Z%2F7McVCfrxTVm6zM7N32Wp20%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;418&quot; height=&quot;717&quot; data-origin-width=&quot;418&quot; data-origin-height=&quot;717&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 확률적 경사하강법&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 예측
y_pred_sgd = sgd_reg.predict(X_test)

# 성능 측정
mse_sgd = mean_squared_error(y_test, y_pred_sgd)
r2_sgd = r2_score(y_test, y_pred_sgd)

# 평균 비율 오차
def MPE(y_true, y_pred):
    return np.mean((y_true - y_pred) / y_true) * 100

print(&quot;[SGDRegressor 결과]&quot;)
print(&quot;가중치(coefficient):&quot;, sgd_reg.coef_)
print(&quot;절편(intercept):&quot;, sgd_reg.intercept_)
print(&quot;MSE:&quot;, mse_sgd)
print(&quot;R2 점수:&quot;, r2_sgd)
print(&quot;평균 비율 오차 : &quot;, MPE(y_test, y_pred_sgd))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;792&quot; data-origin-height=&quot;181&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Ae5HI/btsQVFZlPmM/DRjZUJMi1WCkh7jkKPeeNk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Ae5HI/btsQVFZlPmM/DRjZUJMi1WCkh7jkKPeeNk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Ae5HI/btsQVFZlPmM/DRjZUJMi1WCkh7jkKPeeNk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FAe5HI%2FbtsQVFZlPmM%2FDRjZUJMi1WCkh7jkKPeeNk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;792&quot; height=&quot;181&quot; data-origin-width=&quot;792&quot; data-origin-height=&quot;181&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;예측 후, MSE와 R2 점수를 통해 모델 성능 확인&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  다항 회귀 (Polynomial Regression)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;비선형적인 관계를 &lt;b&gt;다항식(polynomial)&lt;/b&gt; 형태로 모델링&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ex) 2차 다항식&lt;/li&gt;
&lt;li&gt;선형 회귀와 다른 점 : 단순 선형항(X) 뿐만 아니라 $X^2, X^3$,... 같은 &lt;b&gt;고차항&lt;/b&gt;을 추가해 비선형 패턴을 학습할 수 있다&lt;/li&gt;
&lt;li&gt;적용 예시
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;제조 공정에서 온도와 반응률 관계가 곡선 형태인 경우&lt;/li&gt;
&lt;li&gt;건강 데이터에서 나이와 특정 지표(근육량 등)가 단순 선형보다 곡선 형태로 나타나는 경우&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;주의점
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;고차항을 무작정 늘리면 훈련 데이터에는 과도하게 맞춰져 &lt;b&gt;과적합(overfitting)&lt;/b&gt; 문제가 발생&lt;/li&gt;
&lt;li&gt;모델 복잡도와 일반화 성능 간의 균형을 맞춰야 함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  다항회귀 코드&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;coffeescript&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd
from sklearn.datasets import make_friedman1
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.pipeline import Pipeline

# 1) 비선형 데이터 생성 (make_friedman1)
# n_samples: 샘플 개수, n_features: 특성 개수, noise: 잡음 크기
X, y = make_friedman1(n_samples=1000, n_features=5, noise=1.0, random_state=42)
print(X.shape)
print(y.shape)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;100&quot; data-origin-height=&quot;59&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bDDO9A/btsQYLchhOo/3PBOEyKrWQhwYXCUphY2AK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bDDO9A/btsQYLchhOo/3PBOEyKrWQhwYXCUphY2AK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bDDO9A/btsQYLchhOo/3PBOEyKrWQhwYXCUphY2AK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbDDO9A%2FbtsQYLchhOo%2F3PBOEyKrWQhwYXCUphY2AK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;100&quot; height=&quot;59&quot; data-origin-width=&quot;100&quot; data-origin-height=&quot;59&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 2) 학습/테스트 분리
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;85&quot; data-origin-height=&quot;102&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bG7pSu/btsQXaKVMqq/Ln5rVbpjOY5Ig8OQ3f1snK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bG7pSu/btsQXaKVMqq/Ln5rVbpjOY5Ig8OQ3f1snK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bG7pSu/btsQXaKVMqq/Ln5rVbpjOY5Ig8OQ3f1snK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbG7pSu%2FbtsQXaKVMqq%2FLn5rVbpjOY5Ig8OQ3f1snK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;85&quot; height=&quot;102&quot; data-origin-width=&quot;85&quot; data-origin-height=&quot;102&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  LinearRegression()&lt;/b&gt;&lt;/h3&gt;
&lt;pre class=&quot;reasonml&quot;&gt;&lt;code&gt;# 3) 단순 선형회귀 모델 (비교용)
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_pred_lin = lin_reg.predict(X_test)

mse_lin = mean_squared_error(y_test, y_pred_lin)
r2_lin = r2_score(y_test, y_pred_lin)

# 평균 비율 오차
def MPE(y_true, y_pred):
    return np.mean((y_true - y_pred) / y_true) * 100

print(&quot;[단순 선형회귀 결과]&quot;)
print(&quot;MSE:&quot;, mse_lin)
print(&quot;R2:&quot;, r2_lin)
print(&quot;평균 비율 오차 : &quot;, MPE(y_test, y_pred_lin))
print()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;346&quot; data-origin-height=&quot;108&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dsy9ey/btsQVwaoKYq/mq7fJ6n4kUZEWhbVzEVk30/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dsy9ey/btsQVwaoKYq/mq7fJ6n4kUZEWhbVzEVk30/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dsy9ey/btsQVwaoKYq/mq7fJ6n4kUZEWhbVzEVk30/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fdsy9ey%2FbtsQVwaoKYq%2Fmq7fJ6n4kUZEWhbVzEVk30%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;346&quot; height=&quot;108&quot; data-origin-width=&quot;346&quot; data-origin-height=&quot;108&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; 비선형을 고려하지 않고, &lt;b&gt;LinearRegression&lt;/b&gt;만 적용했을 때의 성능을 MSE, R^2, MPE로 측정&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  PolynomialFeatures()&lt;/b&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 4) Polynomial Regression (2차 예시)
poly_model = Pipeline([
    (&quot;poly&quot;, PolynomialFeatures(degree=2, include_bias=False)),
    (&quot;lin_reg&quot;, LinearRegression())
])
poly_model.fit(X_train, y_train)
y_pred_poly = poly_model.predict(X_test)

mse_poly = mean_squared_error(y_test, y_pred_poly)
r2_poly = r2_score(y_test, y_pred_poly)

# 평균 비율 오차
def MPE(y_true, y_pred):
    return np.mean((y_true - y_pred) / y_true) * 100

print(&quot;[다항회귀(2차) 결과]&quot;)
print(&quot;MSE:&quot;, mse_poly)
print(&quot;R2:&quot;, r2_poly)
print(&quot;평균 비율 오차 : &quot;, MPE(y_test, y_pred_poly))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;329&quot; data-origin-height=&quot;106&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xZCHx/btsQYH8LOJc/0uWyFkaLN7Y3XKn6U7Rky1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xZCHx/btsQYH8LOJc/0uWyFkaLN7Y3XKn6U7Rky1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xZCHx/btsQYH8LOJc/0uWyFkaLN7Y3XKn6U7Rky1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxZCHx%2FbtsQYH8LOJc%2F0uWyFkaLN7Y3XKn6U7Rky1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;329&quot; height=&quot;106&quot; data-origin-width=&quot;329&quot; data-origin-height=&quot;106&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;PolynomialFeatures(degree=2)로 2차 항까지 고려하도록 변한 후, 다시 선형회귀를 적용하는 &lt;b&gt;파이프라인&lt;/b&gt;을 구성&lt;/li&gt;
&lt;li&gt;&lt;b&gt;비선형&lt;/b&gt; 패턴을 어느 정도 학습할 수 있으므로, 단순 선형회귀보다 더 좋은 성능이 기대됨 (물론 과적합 위험도 존재&lt;/li&gt;
&lt;li&gt;&lt;b&gt;결과 비교&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;MSE, R^2 등을 비교하여 단순 선형회귀 대비 다항회귀가 Friedman1 데이터셋에서 어떤 차이를 보이는지 확인 가능&lt;/li&gt;
&lt;li&gt;만약 더 높은 차수(예: 3차, 4차)를 적용하거나, 다른 비선형 모델(예: 랜덤 포레스트, SVM 회귀 등)을 사용하면 성능이 달라질 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  회귀 모델 평가 방법&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  MSE (Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;예측값과 실제값의 차이를 &lt;b&gt;제곱&lt;/b&gt;하여 평균&lt;/li&gt;
&lt;li&gt;오차가 클수록 제곱에 의해 더 큰 벌점이 매겨지므로, &lt;b&gt;큰 오차&lt;/b&gt;에 특히 민감&lt;/li&gt;
&lt;li&gt;&lt;b&gt;평균 제곱 오차&lt;/b&gt;라고도 하며, 회귀 모델 평가에서 매우 자주 사용됨&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  MAE (Mean Absolute Error)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;예측값과 실제값의 차이를 &lt;b&gt;절댓값&lt;/b&gt;으로 측정한 후 평균&lt;/li&gt;
&lt;li&gt;예측이 평균적으로 실제값에서 얼마나 벗어났는지 직관적으로 표현&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;MAE와 달리 제곱을 통해 큰 오차에 가중치를 더 주는 특징&lt;/li&gt;
&lt;li&gt;오차가 클수록 패널티가 커지므로, 큰 오차가 중요한 문제에서 자주 사용&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  R&amp;sup2; (결정 계수)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;yˉ: 종속변수의 평균&lt;/li&gt;
&lt;li&gt;&lt;b&gt;값의 범위&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;0 ~ 1 (음수가 될 수도 있음)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;해석&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;1에 가까울수록 학습된 모델이 데이터를 잘 설명한다고 볼 수 있음&lt;/li&gt;
&lt;li&gt;0이라면 모델이 종속변수를 전혀 설명하지 못한다는 의미&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  고급 회귀 기법 - Ridge &amp;amp; Lasso Regression&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;선형 회귀에 &lt;b&gt;규제(Regularization)&lt;/b&gt; 항을 추가하여 과적합을 방지&amp;nbsp;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;  &lt;b&gt;Ridge(릿지) 회귀&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;가중치 제곱합(L2 Norm)을 패널티로 추가&lt;/li&gt;
&lt;li&gt;효과 : 가중치가 너무 커지지 않도록 방지 (가중치 값을 &lt;b&gt;부드럽게 줄임&lt;/b&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;  &lt;b&gt;Lasso(라쏘) 회귀&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;가중치 절댓값합(L1 Norm)을 패널티로 추가&lt;/li&gt;
&lt;li&gt;효과 : 가중치를 0으로 만들어 변수 선택(Feature Selection) 효과&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  릿지회귀 &amp;amp; 라쏘회귀 코드&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;haskell&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
from sklearn.metrics import mean_squared_error, r2_score

# 1. 데이터 로드
housing = fetch_california_housing()
X = housing.data
y = housing.target

print(X.shape)
print(y.shape)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;135&quot; data-origin-height=&quot;59&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vSEGh/btsQ4lYPDYx/1ERrWZDkjBiwcP12pnZpB1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vSEGh/btsQ4lYPDYx/1ERrWZDkjBiwcP12pnZpB1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vSEGh/btsQ4lYPDYx/1ERrWZDkjBiwcP12pnZpB1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FvSEGh%2FbtsQ4lYPDYx%2F1ERrWZDkjBiwcP12pnZpB1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;135&quot; height=&quot;59&quot; data-origin-width=&quot;135&quot; data-origin-height=&quot;59&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 2. 학습/테스트 데이터 분리
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;127&quot; data-origin-height=&quot;104&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bL6vls/btsQ0t5NlSL/tFCZorYwlM00uY782i0sV0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bL6vls/btsQ0t5NlSL/tFCZorYwlM00uY782i0sV0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bL6vls/btsQ0t5NlSL/tFCZorYwlM00uY782i0sV0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbL6vls%2FbtsQ0t5NlSL%2FtFCZorYwlM00uY782i0sV0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;127&quot; height=&quot;104&quot; data-origin-width=&quot;127&quot; data-origin-height=&quot;104&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;  &lt;span style=&quot;background-color: #dddddd;&quot;&gt;&lt;b&gt;Ridge()&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 3. Ridge 회귀
# alpha=1.0 (규제 세기) 는 필요에 따라 조정 가능
ridge_reg = Ridge(alpha=1.0, random_state=42)
ridge_reg.fit(X_train, y_train)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;283&quot; data-origin-height=&quot;360&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Hx6rt/btsQ4lR3Efp/MRR7MEnS8llAFJLyKKtC9k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Hx6rt/btsQ4lR3Efp/MRR7MEnS8llAFJLyKKtC9k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Hx6rt/btsQ4lR3Efp/MRR7MEnS8llAFJLyKKtC9k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FHx6rt%2FbtsQ4lR3Efp%2FMRR7MEnS8llAFJLyKKtC9k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;283&quot; height=&quot;360&quot; data-origin-width=&quot;283&quot; data-origin-height=&quot;360&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;L2 규제항을 포함하는 Ridge 모델&lt;/li&gt;
&lt;li&gt;alpha가 클수록 규제 강도가 세어져, 모델 가중치(계수)들의 크기를 더욱 제약&lt;/li&gt;
&lt;li&gt;학습 후, 예측 결과에 대해 &lt;b&gt;MSE&lt;/b&gt;와 &lt;b&gt;R^2&lt;/b&gt; 점수를 계산&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 예측
y_pred_ridge = ridge_reg.predict(X_test)

# 성능 평가
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
r2_ridge = r2_score(y_test, y_pred_ridge)

# 평균 비율 오차
def MPE(y_true, y_pred):
    return np.mean((y_true - y_pred) / y_true) * 100

print(&quot;[Ridge 회귀 결과]&quot;)
print(&quot;  가중치(coefficient):&quot;, ridge_reg.coef_)
print(&quot;  절편(intercept):&quot;, ridge_reg.intercept_)
print(&quot;  MSE:&quot;, mse_ridge)
print(&quot;  R^2 점수:&quot;, r2_ridge)
print(&quot;평균 비율 오차 : &quot;, MPE(y_test, y_pred_ridge))
print()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;973&quot; data-origin-height=&quot;183&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/UUGyi/btsQ3xrUNJk/WzwcGcE3mEkS3vf7gUnskK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/UUGyi/btsQ3xrUNJk/WzwcGcE3mEkS3vf7gUnskK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/UUGyi/btsQ3xrUNJk/WzwcGcE3mEkS3vf7gUnskK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FUUGyi%2FbtsQ3xrUNJk%2FWzwcGcE3mEkS3vf7gUnskK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;973&quot; height=&quot;183&quot; data-origin-width=&quot;973&quot; data-origin-height=&quot;183&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt; &lt;span style=&quot;background-color: #dddddd;&quot;&gt; Lasso()&lt;/span&gt;&lt;/b&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 4. Lasso 회귀
# alpha=0.1 정도로 조금 낮춰 볼 수도 있음 (기본값 1.0)
# alpha가 너무 크면 가중치가 0이 되어 과소적합 위험이 있습니다.
lasso_reg = Lasso(alpha=0.1, random_state=42, max_iter=10000)
lasso_reg.fit(X_train, y_train)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;298&quot; data-origin-height=&quot;417&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bfUzic/btsQ2pakaNS/x9oYpGXz9PZZXJqgARsDQ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bfUzic/btsQ2pakaNS/x9oYpGXz9PZZXJqgARsDQ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bfUzic/btsQ2pakaNS/x9oYpGXz9PZZXJqgARsDQ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbfUzic%2FbtsQ2pakaNS%2Fx9oYpGXz9PZZXJqgARsDQ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;298&quot; height=&quot;417&quot; data-origin-width=&quot;298&quot; data-origin-height=&quot;417&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;L1 규제항을 사용하는 Lasso 모델&lt;/li&gt;
&lt;li&gt;alpha가 클수록 일부 가중치가 &lt;b&gt;정확히 0&lt;/b&gt;으로 수렴(특성 선택 효과)&lt;/li&gt;
&lt;li&gt;마찬가지로 &lt;b&gt;MSE&lt;/b&gt;, &lt;b&gt;R^2&lt;/b&gt;를 통해 성능을 평가&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 예측
y_pred_lasso = lasso_reg.predict(X_test)

# 성능 평가
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
r2_lasso = r2_score(y_test, y_pred_lasso)

# 평균 비율 오차
def MPE(y_true, y_pred):
    return np.mean((y_true - y_pred) / y_true) * 100

print(&quot;[Lasso 회귀 결과]&quot;)
print(&quot;  가중치(coefficient):&quot;, lasso_reg.coef_)
print(&quot;  절편(intercept):&quot;, lasso_reg.intercept_)
print(&quot;  MSE:&quot;, mse_lasso)
print(&quot;  R^2 점수:&quot;, r2_lasso)
print(&quot;평균 비율 오차 : &quot;, MPE(y_test, y_pred_lasso))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;946&quot; data-origin-height=&quot;178&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/clKdde/btsQ2yrkDir/JNIAKAZE9bbHoCXpZPzFKk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/clKdde/btsQ2yrkDir/JNIAKAZE9bbHoCXpZPzFKk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/clKdde/btsQ2yrkDir/JNIAKAZE9bbHoCXpZPzFKk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FclKdde%2FbtsQ2yrkDir%2FJNIAKAZE9bbHoCXpZPzFKk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;946&quot; height=&quot;178&quot; data-origin-width=&quot;946&quot; data-origin-height=&quot;178&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;  정리 / Q&amp;amp;A&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  정리&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;회귀 모델은 &lt;b&gt;연속형&lt;/b&gt; 결과 변수를 예측하는 데 사용&lt;/li&gt;
&lt;li&gt;&lt;b&gt;선형 회귀&lt;/b&gt;는 가장 기본적인 형태지만, 데이터의 패턴이 비선형일 경우 &lt;b&gt;다항 회귀&lt;/b&gt; 등을 고려&lt;/li&gt;
&lt;li&gt;&lt;b&gt;규제(Regularization)&lt;/b&gt; 기법을 활용한 모델(Lasso, Ridge)은 가중치를 규제하여 과적합을 방지&lt;/li&gt;
&lt;li&gt;&lt;b&gt;앙상블&lt;/b&gt; 기법(Gradient Boosting, XGBoost 등)을 사용하는 경우 복잡한 비선형 패턴을 더 잘 포착할 수 있다 (추후 배울 내용)&lt;/li&gt;
&lt;li&gt;모델의 성능 평가는 MAE, RMSE, R&amp;sup2; 등 다양한 지표를 통해 진행&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  Q&amp;amp;A&amp;nbsp; &amp;nbsp; &lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;  &lt;b&gt;Q1: 선형 회귀와 다항 회귀 중 어느 것을 선택해야 하나요?&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;A1: 데이터의 분포와 잔차(오차) 패턴을 확인하여, 단순 선형 모델로 설명이 어렵다면 다항 회귀를 고려합니다. 먼저, 선형회귀와 다항회귀 모두를 진행해보고 나서 판단해보는 방법도 있습니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;  &lt;b&gt;Q2: Lasso와 Ridge 중 어느 규제 기법을 써야 하나요?&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;A2: Lasso는 변수 선택(가중치를 0으로 만들어 불필요한 변수 제거)에 유리하며, Ridge는 모든 가중치를 부드럽게 줄여 모델 안정성을 높이는 데 유리합니다. 분석 목적과 데이터 특성에 따라 선택하거나, Elastic Net처럼 두 기법을 혼합하는 방법도 있습니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;  &lt;b&gt;Q3: 앙상블 기법은 항상 선형 회귀보다 좋은가요?&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;A3: 대부분의 경우 앙상블 기법이 예측 성능이 높지만, 데이터의 규모나 특징, 문제의 복잡도에 따라 다릅니다. 또한 하이퍼파라미터 튜닝이 까다롭고 계산 비용이 큰 단점도 있으므로 상황에 맞춰 선택합니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;  &lt;b&gt;Q4: 회귀 모델에서 독립변수가 많아질수록 항상 성능이 좋아지나요?&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;A4: 독립변수가 많아지면 모델이 복잡해져 과적합될 가능성이 커집니다. 변수 선택 기법(Feature Selection)이나 규제(Regularization)를 통해 불필요한 변수를 제거하고, 교차검증으로 모델 일반화 성능을 평가해야 합니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;  &lt;b&gt;Q5: 회귀 모델을 만들 때 어떤 지표(R&amp;sup2;, MAE, RMSE)를 우선적으로 봐야 하나요?&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;A5: 문제의 특성에 따라 달라집니다. 예측 오차의 크기가 중요하면 RMSE나 MAE, 모델이 데이터를 얼마나 잘 설명하는지 보려면 R&amp;sup2;를 사용합니다. 여러 지표를 종합적으로 살펴보는 것이 바람직합니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Sparta/Theory</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/70</guid>
      <comments>https://junecho.tistory.com/70#entry70comment</comments>
      <pubDate>Tue, 30 Sep 2025 21:12:24 +0900</pubDate>
    </item>
    <item>
      <title>[250929] 머신러닝 02</title>
      <link>https://junecho.tistory.com/69</link>
      <description>&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;✅ 데이터 전처리&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;원시(raw) 데이터에서 불필요하거나 손실(노이즈)이 있는 부분을 처리하고, 분석 목적에 맞는 형태로 만드는 과정&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex)&amp;nbsp;스팸&amp;nbsp;메일&amp;nbsp;필터링,&amp;nbsp;이미지&amp;nbsp;분류,&amp;nbsp;음성&amp;nbsp;인식&amp;nbsp;등&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;  결측치 처리&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;삭제 (Removal)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 결측치가 있는 행(row) 또는 열(column)을 제거&lt;/li&gt;
&lt;li&gt;간단하지만 데이터 손실이 발생&lt;/li&gt;
&lt;li&gt;결측치가 전체 데이터에서 매우 소수일 때 적합&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;대체 (Imputation)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;평균or중앙값&lt;/b&gt;으로 대체 &amp;rarr; 수치형 데이터에서 많이 사용, 데이터 분포 왜곡이 비교적 적음&lt;/li&gt;
&lt;li&gt;&lt;b&gt;최빈값&lt;/b&gt;으로 대체 &amp;rarr; 범주형 데이터에서 사용&lt;/li&gt;
&lt;li&gt;&lt;b&gt;예측 모델&lt;/b&gt;로 대체 &amp;rarr; 회귀/분류 모델을 이용해 결측값을 예측&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  결측치 처리 코드&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;angelscript&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd

# 1) 가상 데이터 생성
#   - 일부 값들을 np.nan으로 지정해 결측값을 만듭니다.
data = {
    'A': [1, 2, np.nan, 4, 5, np.nan, 7],
    'B': [5, 4, 2, np.nan, np.nan, 3, 1],
    'C': [2, np.nan, np.nan, 6, 7, 8, 9]
}
df = pd.DataFrame(data)
df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;196&quot; data-origin-height=&quot;321&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bX3dxM/btsQVS3Vjkq/Cn394z3ODMVhVCC3QCa8s1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bX3dxM/btsQVS3Vjkq/Cn394z3ODMVhVCC3QCa8s1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bX3dxM/btsQVS3Vjkq/Cn394z3ODMVhVCC3QCa8s1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbX3dxM%2FbtsQVS3Vjkq%2FCn394z3ODMVhVCC3QCa8s1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;196&quot; height=&quot;321&quot; data-origin-width=&quot;196&quot; data-origin-height=&quot;321&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;  .dropna()&lt;/b&gt; : 결측이 하나라도 있으면 해당 행 제거&lt;/p&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;# 2) 결측치 제거 (결측이 하나라도 있으면 해당 행을 제거)
df_drop = df.dropna()
df_drop
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;147&quot; data-origin-height=&quot;113&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/diioat/btsQVpOEGyd/hvkkL7L4FQkVeVcr4N9JDK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/diioat/btsQVpOEGyd/hvkkL7L4FQkVeVcr4N9JDK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/diioat/btsQVpOEGyd/hvkkL7L4FQkVeVcr4N9JDK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fdiioat%2FbtsQVpOEGyd%2FhvkkL7L4FQkVeVcr4N9JDK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;147&quot; height=&quot;113&quot; data-origin-width=&quot;147&quot; data-origin-height=&quot;113&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;  .fillna()&lt;/b&gt; : NULL값을 지정된 값으로 바꿈&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; &lt;/b&gt; numeric_only=True : 숫자형 데이터에만 계산을 해주고 싶을 때 사용&lt;/p&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;# 3) 평균값으로 대치
df_mean = df.copy()
df_mean = df_mean.fillna(df_mean.mean(numeric_only=True))
df_mean
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;157&quot; data-origin-height=&quot;315&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d0hlL0/btsQTQ7zjJ1/bpGtCThd1f6dCKRwyAhj1K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d0hlL0/btsQTQ7zjJ1/bpGtCThd1f6dCKRwyAhj1K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d0hlL0/btsQTQ7zjJ1/bpGtCThd1f6dCKRwyAhj1K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd0hlL0%2FbtsQTQ7zjJ1%2FbpGtCThd1f6dCKRwyAhj1K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;157&quot; height=&quot;315&quot; data-origin-width=&quot;157&quot; data-origin-height=&quot;315&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;# 4) 중앙값으로 대치
df_median = df.copy()
df_median = df_median.fillna(df_median.median(numeric_only=True))
df_median
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;158&quot; data-origin-height=&quot;328&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/euM3xQ/btsQVjgFEk2/13csXY3aOio6akNKvRxB81/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/euM3xQ/btsQVjgFEk2/13csXY3aOio6akNKvRxB81/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/euM3xQ/btsQVjgFEk2/13csXY3aOio6akNKvRxB81/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeuM3xQ%2FbtsQVjgFEk2%2F13csXY3aOio6akNKvRxB81%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;158&quot; height=&quot;328&quot; data-origin-width=&quot;158&quot; data-origin-height=&quot;328&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;# 5) 최빈값으로 대치
#   - DataFrame의 mode()는 각 열별로 최빈값을 반환합니다.
#   - mode() 결과가 여러 개(동률)일 경우 첫 번째 행의 값을 취합니다.
df_mode = df.copy()
print(df_mode.mode()) # 확인용
mode_values = df_mode.mode().iloc[0]  # 첫 번째 행(가장 상위 mode)만 취함
df_mode = df_mode.fillna(mode_values)
df_mode
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;173&quot; data-origin-height=&quot;166&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/caDKOq/btsQVBBpmGS/qGywtKfocp7or3iBa22WX1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/caDKOq/btsQVBBpmGS/qGywtKfocp7or3iBa22WX1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/caDKOq/btsQVBBpmGS/qGywtKfocp7or3iBa22WX1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcaDKOq%2FbtsQVBBpmGS%2FqGywtKfocp7or3iBa22WX1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;173&quot; height=&quot;166&quot; data-origin-width=&quot;173&quot; data-origin-height=&quot;166&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;160&quot; data-origin-height=&quot;319&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/C2s13/btsQSR6PqLx/lkoU2SkcsTxukzVSm8kKck/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/C2s13/btsQSR6PqLx/lkoU2SkcsTxukzVSm8kKck/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/C2s13/btsQSR6PqLx/lkoU2SkcsTxukzVSm8kKck/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FC2s13%2FbtsQSR6PqLx%2FlkoU2SkcsTxukzVSm8kKck%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;160&quot; height=&quot;319&quot; data-origin-width=&quot;160&quot; data-origin-height=&quot;319&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  이상치 탐지 &amp;amp; 제거&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이상치 (Outlier) : 정상 범주에서 크게 벗어난 값&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  이상치 탐지&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1️⃣ 통계적 기법 (3&amp;sigma; Rule)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 데이터가 정규분포를 따른다고 가정, 평균에서 &amp;plusmn;3&amp;sigma;(표준편차) 범위를 벗어나는 값을&lt;/li&gt;
&lt;li&gt;이상치로 간주&lt;/li&gt;
&lt;li&gt;직관적이고 간단하나 정규성 가정이 틀릴 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2️⃣ 박스플롯(Boxplot) 기준&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 사분위수 (IQR = Q3 - Q1) 를 이용해 (Q1 - 1.5 * IQR) + (Q3 + 1.5 * IQR) 벗어나는 데이터를 이상치로 간주&lt;/li&gt;
&lt;li&gt;분포 특성에 영향을 적게 받는 장점&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;833&quot; data-origin-height=&quot;387&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/svD1m/btsQVmqU60C/n9zKOL9OlQ4fnOYykWUWuk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/svD1m/btsQVmqU60C/n9zKOL9OlQ4fnOYykWUWuk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/svD1m/btsQVmqU60C/n9zKOL9OlQ4fnOYykWUWuk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FsvD1m%2FbtsQVmqU60C%2Fn9zKOL9OlQ4fnOYykWUWuk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;661&quot; height=&quot;307&quot; data-origin-width=&quot;833&quot; data-origin-height=&quot;387&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3️⃣ 머신러닝 기반&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 이상치 탐지 알고리즘 (Isolation Forest, DBSCAN 등)&lt;/li&gt;
&lt;li&gt;복합적 패턴을 고려할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  이상치 제거 코드&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;nix&quot;&gt;&lt;code&gt;import pandas as pd
import numpy as np

# 예시 데이터프레임 생성
np.random.seed(42)  # 재현성을 위해 시드 설정
normal_values = np.random.normal(loc=50, scale=5, size=30)   # 평균 50, 표준편차 5인 정규분포에서 30개 값 생성
outliers = [150, 180, 200, 300]  # 눈으로 봐도 이상치로 판단될 수 있는 큰 값들

# normal_values와 outliers를 합쳐서 하나의 리스트로 구성
all_values = np.concatenate([normal_values, outliers])
# 예시로 0~39 범위의 임의 날짜/시간 데이터를 간단히 만들기
dates = pd.date_range('2021-01-01', periods=len(all_values), freq='D')

df = pd.DataFrame({
    'date': dates,
    'sensor_value': all_values
})
df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;284&quot; data-origin-height=&quot;894&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/OM8tA/btsQTD1iRAU/Kjwxr1hv0ZcM5sf8uFZT11/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/OM8tA/btsQTD1iRAU/Kjwxr1hv0ZcM5sf8uFZT11/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/OM8tA/btsQTD1iRAU/Kjwxr1hv0ZcM5sf8uFZT11/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FOM8tA%2FbtsQTD1iRAU%2FKjwxr1hv0ZcM5sf8uFZT11%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;284&quot; height=&quot;894&quot; data-origin-width=&quot;284&quot; data-origin-height=&quot;894&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;pre class=&quot;makefile&quot;&gt;&lt;code&gt;# 이상치 제거 (간단하게 박스플롯 기준 적용 예시)
Q1 = df['sensor_value'].quantile(0.25)
Q3 = df['sensor_value'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
df = df[(df['sensor_value'] &amp;gt;= lower_bound) &amp;amp; (df['sensor_value'] &amp;lt;= upper_bound)]
df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;743&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ctSEBM/btsQTTDcRSv/Brpll5inTTXILHYDzDkHb1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ctSEBM/btsQTTDcRSv/Brpll5inTTXILHYDzDkHb1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ctSEBM/btsQTTDcRSv/Brpll5inTTXILHYDzDkHb1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FctSEBM%2FbtsQTTDcRSv%2FBrpll5inTTXILHYDzDkHb1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;279&quot; height=&quot;743&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;743&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  정규화/표준화 = 스케일링&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ WHY 필요&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;모델(특히 거리 기반 알고리즘, 딥러닝 등)에 따라 특정 변수의 스케일이 크게 영향을 미칠 수 ⭕&lt;/li&gt;
&lt;li&gt;ex) 센서 A는 값 범위가 0~1000, 센서 B는 값 범위가 0~1이라면, A가 모델에 더 큰 영향을 줌&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  정규화 (MinMaxScaler)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;: 모든 값을 0과 1 사이로 매핑&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;값의 스케일이 달라도 공통 범위로 맞출 수 있음&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;딥러닝(신경망)&lt;/b&gt;, &lt;b&gt;이미지 처리&lt;/b&gt; 등에서 입력값을 0~1로 제한해야 하거나, 각 특성이 동일한 범위 내 있어야 하는 경우 자주 사용&lt;/li&gt;
&lt;li&gt;거리 기반 알고리즘(유클리디안 거리 사용)이나, 각 특성의 범위를 동일하게 맞춤으로써 계산 안정성을 높이고 싶을 때&lt;/li&gt;
&lt;li&gt;최소값&amp;middot;최대값이 &lt;b&gt;극단값(Outlier)에 민감&lt;/b&gt;. 만약 극단치가 있으면 대부분의 데이터가 [0, 1] 구간 내부 한쪽에 치우침&lt;/li&gt;
&lt;li&gt;새로운 데이터가 기존 최대값보다 커지거나, 최소값보다 작아지는 경우, 스케일링 범위를 벗어날 수 있어 재학습하거나 다른 처리가 필요&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  표준화 (StandardScaler)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: &lt;b&gt;평균을 0, 표준편차를 1로 만듦&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;분포가 정규분포에 가깝게 변형됨&lt;/li&gt;
&lt;li&gt;평균이 0, 표준편차가 1로 맞춰지므로, &lt;b&gt;정규분포 가정&lt;/b&gt;을 사용하는 알고리즘 (선형회귀, 로지스틱회귀, SVM 등)에 자주 쓰임&lt;/li&gt;
&lt;li&gt;변환된 값들이 이론적으로 -&lt;b&gt;&amp;infin; ~ +&amp;infin;&lt;/b&gt; 범위를 가질 수 있음&lt;/li&gt;
&lt;li&gt;데이터가 특정 구간([0, 1] 등)에 고정되지 않음&lt;/li&gt;
&lt;li&gt;데이터 분포가 &lt;b&gt;심하게 치우쳐&lt;/b&gt; 있으면, 평균과 표준편차만으로는 충분한 스케일링이 되지 않을 수 있음 (로그 변환, RobustScaler 등 추가 고려)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  정규화/표준화 코드&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;autoit&quot;&gt;&lt;code&gt;import pandas as pd
import numpy as np

# 난수를 재현하기 위해 시드 설정
np.random.seed(42)

# 예시 마케팅 지표 데이터 생성
data_size = 10
df = pd.DataFrame({
    'impressions': np.random.randint(1000, 10000, size=data_size), # 광고 노출 횟수
    'clicks': np.random.randint(0, 300, size=data_size), # 광고 클릭 횟수
    'conversions': np.random.randint(0, 50, size=data_size), # 광고를 통해 구매한 횟수
    'cost': np.random.randint(100, 5000, size=data_size), # 광고비 지출액
    'revenue': np.random.randint(100, 10000, size=data_size) # 광고를 통해 발생한 매출
})
df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;509&quot; data-origin-height=&quot;466&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/HGDic/btsQTnq394I/0xXgeEmFb0crUyz1Itviq1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/HGDic/btsQTnq394I/0xXgeEmFb0crUyz1Itviq1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/HGDic/btsQTnq394I/0xXgeEmFb0crUyz1Itviq1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FHGDic%2FbtsQTnq394I%2F0xXgeEmFb0crUyz1Itviq1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;509&quot; height=&quot;466&quot; data-origin-width=&quot;509&quot; data-origin-height=&quot;466&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;  MinMaxScaler()&lt;/b&gt; : &lt;b&gt;정규화&lt;/b&gt;. 일반적으로 &lt;b&gt;0과 1 사이의 지정된 범위로 특성을 조정&lt;/b&gt;하는데 사용&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

# 스케일링을 적용할 컬럼만 선정
cols_to_scale = ['impressions', 'clicks', 'conversions', 'cost', 'revenue']

# MinMaxScaler 객체 생성(기본 스케일: [0,1])
minmax_scaler = MinMaxScaler()

# fit_transform을 통해 스케일링된 결과를 데이터프레임으로 변환
df_minmax_scaled = pd.DataFrame(minmax_scaler.fit_transform(df[cols_to_scale]), 
                                columns=cols_to_scale)                            
print(df_minmax_scaled.max())
print(df_minmax_scaled.min())
                     
df_minmax_scaled
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;547&quot; data-origin-height=&quot;685&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cVo2BW/btsQTNC0PV8/3BzOzkTGlQFJkmKvpshagk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cVo2BW/btsQTNC0PV8/3BzOzkTGlQFJkmKvpshagk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cVo2BW/btsQTNC0PV8/3BzOzkTGlQFJkmKvpshagk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcVo2BW%2FbtsQTNC0PV8%2F3BzOzkTGlQFJkmKvpshagk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;547&quot; height=&quot;685&quot; data-origin-width=&quot;547&quot; data-origin-height=&quot;685&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;MinMaxScaler()&lt;/b&gt; 에서의 fit_transform()
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 밑의 두 단계를 한 번에 호출&lt;/li&gt;
&lt;li&gt;fit : 열을 살펴 최솟값과 최댓값을 찾음&lt;/li&gt;
&lt;li&gt;transform : 찾은 최솟값/최댓값으로 각 데이터를 0~1 범위로 바꿈&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;  StandardScaler()&lt;/b&gt; : 표준화. 평균을 제거하고 단위 분산으로 스케일링하여 특성을 표준화.&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;from sklearn.preprocessing import StandardScaler

# StandardScaler 객체 생성
standard_scaler = StandardScaler()

# fit_transform을 통해 스케일링된 결과를 데이터프레임으로 변환
df_standard_scaled = pd.DataFrame(standard_scaler.fit_transform(df[cols_to_scale]), 
                                  columns=cols_to_scale)

print(df_standard_scaled.mean())
print(df_standard_scaled.std())
df_standard_scaled
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;580&quot; data-origin-height=&quot;715&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Ik8jp/btsQTzYSypp/LRNZIwyo6n7oOWxTZFKzQ1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Ik8jp/btsQTzYSypp/LRNZIwyo6n7oOWxTZFKzQ1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Ik8jp/btsQTzYSypp/LRNZIwyo6n7oOWxTZFKzQ1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FIk8jp%2FbtsQTzYSypp%2FLRNZIwyo6n7oOWxTZFKzQ1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;580&quot; height=&quot;715&quot; data-origin-width=&quot;580&quot; data-origin-height=&quot;715&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;StandardScaler()&lt;/b&gt; 에서의 fit_transform()
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 밑의 두 단계를 한 번에 호출&lt;/li&gt;
&lt;li&gt;fit : 열의 평균과 표준편차를 구함&lt;/li&gt;
&lt;li&gt;transform : 각 값을 &amp;ldquo;(x - 평균) / 표준편차&amp;rdquo; 로 변환하여 평균 0, 표준편차 1인 분포를 만듦&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  불균형 데이터 처리&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;정상 99%, 불량 1%처럼 한 클래스가 극도로 적은 경우&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문제점 : 모델이 극도로 적은 클래스를 거의 예측하지 못할 가능성이 큼 (편향 발생)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  해결 기법&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ Oversampling&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;Random Oversampling&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 소수 클래스의 데이터를 단순 복제하여 개수를 늘림&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;SMOTE(Synthetic Minority Over-sampling Technique)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 소수 클래스를 &quot;무작정 복사&quot;만 하는 게 아니라, &amp;ldquo;비슷한&amp;rdquo; 데이터들을 서로 섞어서(Interpolation) 새로운 데이터 생성&lt;/li&gt;
&lt;li&gt;즉, 소수 클래스(ex: 스팸) 안에서 가까운 데이터 둘(혹은 몇 개)을 고르고, 그 &lt;b&gt;사이&lt;/b&gt;에 새 데이터 포인트를 만들어내어, 소수 클래스의 다양한 예시를 &lt;b&gt;가상으로&lt;/b&gt; 늘리는 기법&lt;/li&gt;
&lt;li&gt;ex) &amp;ldquo;모양이나 맛이 비슷한 `두 오렌지를 고른 다음, 그 중간 정도 되는 &lt;b&gt;새로운 오렌지&lt;/b&gt;를 상상해서 만들어낸다&amp;rdquo; 같은 느낌&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2️⃣ &lt;b&gt;Undersampling&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 다수의 클래스 데이터를 줄이는 방식&lt;/li&gt;
&lt;li&gt;데이터 손실 위험이 있지만, 전체 데이터 균형을 맞출 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3️⃣ 혼합 기법&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: SMOTE와 언더샘플링을 적절히 섞어서 사용&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  불균형 데이터 처리 코드&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd

# 난수 고정 (재현성)
np.random.seed(42)

# 불균형 데이터 크기 설정
# 예: 총 100개 중 defect=1(불량)인 샘플 10개, defect=0(정상)인 샘플 90개
size_1 = 10
size_0 = 90

# 정상 클래스 (defect=0) 데이터 생성
feature1_0 = np.random.normal(loc=10, scale=2, size=size_0)
feature2_0 = np.random.normal(loc=5, scale=1, size=size_0)

# 불량 클래스 (defect=1) 데이터 생성
feature1_1 = np.random.normal(loc=20, scale=5, size=size_1)
feature2_1 = np.random.normal(loc=10, scale=2, size=size_1)

# 배열 병합
feature1 = np.concatenate([feature1_0, feature1_1])
feature2 = np.concatenate([feature2_0, feature2_1])
defect = np.array([0]*size_0 + [1]*size_1)

# 데이터프레임 생성
df = pd.DataFrame({
    'feature1': feature1,
    'feature2': feature2,
    'defect': defect
})

df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;315&quot; data-origin-height=&quot;509&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/swI7g/btsQVYb04Zo/OvYV3kEvUHdQvIilzOiGk0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/swI7g/btsQVYb04Zo/OvYV3kEvUHdQvIilzOiGk0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/swI7g/btsQVYb04Zo/OvYV3kEvUHdQvIilzOiGk0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FswI7g%2FbtsQVYb04Zo%2FOvYV3kEvUHdQvIilzOiGk0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;286&quot; height=&quot;462&quot; data-origin-width=&quot;315&quot; data-origin-height=&quot;509&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;714&quot; data-origin-height=&quot;559&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bY4gLr/btsQVsq74CN/YK7BnK4lLg9UJDWEfESIs1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bY4gLr/btsQVsq74CN/YK7BnK4lLg9UJDWEfESIs1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bY4gLr/btsQVsq74CN/YK7BnK4lLg9UJDWEfESIs1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbY4gLr%2FbtsQVsq74CN%2FYK7BnK4lLg9UJDWEfESIs1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;611&quot; height=&quot;478&quot; data-origin-width=&quot;714&quot; data-origin-height=&quot;559&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;  SMOTE()&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;nix&quot;&gt;&lt;code&gt;from imblearn.over_sampling import SMOTE
# 불균형 데이터 처리 (SMOTE)
X = df.drop('defect', axis=1)   # 결측치 처리, 이상치 제거, 인코딩 등 사전 처리 후
y = df['defect']
smote = SMOTE(random_state=42)
X_res, y_res = smote.fit_resample(X, y)
&lt;/code&gt;&lt;/pre&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;fit_resample
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: SMOTE 알고리즘이 x, y를 바탕으로 소수 클래스 데이터를 자동 생성&lt;/li&gt;
&lt;li&gt;오버샘플링된 X_res, y_res에는 클래스 불균형이 &lt;b&gt;개선된&lt;/b&gt;(1:1에 가깝거나 원하는 비율이 된) 상태가 됨&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&quot;nginx&quot;&gt;&lt;code&gt;X_res
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;266&quot; data-origin-height=&quot;517&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c26ct3/btsQVskkqAL/3l2fx78W6WEAbKNFBJKTkK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c26ct3/btsQVskkqAL/3l2fx78W6WEAbKNFBJKTkK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c26ct3/btsQVskkqAL/3l2fx78W6WEAbKNFBJKTkK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc26ct3%2FbtsQVskkqAL%2F3l2fx78W6WEAbKNFBJKTkK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;236&quot; height=&quot;459&quot; data-origin-width=&quot;266&quot; data-origin-height=&quot;517&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;pre class=&quot;nginx&quot;&gt;&lt;code&gt;y_res
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;201&quot; data-origin-height=&quot;522&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c0JkOd/btsQVsdy1B1/MRItQ0YERHWfEPxGJtBAGk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c0JkOd/btsQVsdy1B1/MRItQ0YERHWfEPxGJtBAGk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c0JkOd/btsQVsdy1B1/MRItQ0YERHWfEPxGJtBAGk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc0JkOd%2FbtsQVsdy1B1%2FMRItQ0YERHWfEPxGJtBAGk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;161&quot; height=&quot;418&quot; data-origin-width=&quot;201&quot; data-origin-height=&quot;522&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;pre class=&quot;css&quot;&gt;&lt;code&gt;y_res.hist()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;695&quot; data-origin-height=&quot;518&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cIRPJS/btsQVU8vvVa/D3D10pOTaFF2ZM6gKHKipK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cIRPJS/btsQVU8vvVa/D3D10pOTaFF2ZM6gKHKipK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cIRPJS/btsQVU8vvVa/D3D10pOTaFF2ZM6gKHKipK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcIRPJS%2FbtsQVU8vvVa%2FD3D10pOTaFF2ZM6gKHKipK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;518&quot; height=&quot;386&quot; data-origin-width=&quot;695&quot; data-origin-height=&quot;518&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  범주형 데이터 변환&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;pre class=&quot;autoit&quot;&gt;&lt;code&gt;import pandas as pd
import numpy as np

# 예시 데이터프레임 생성
data_size = 10
np.random.seed(42)

labels = ['apple', 'banana', 'cherry']
random_labels = np.random.choice(labels, data_size)

df = pd.DataFrame({
    'id': range(1, data_size + 1),
    'label': random_labels,
    'value': np.random.randint(1, 100, data_size),
    'another_feature': np.random.choice(['A', 'B'], data_size)  # 또 다른 범주형 변수
})

df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;387&quot; data-origin-height=&quot;446&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bAaMGZ/btsQUktNTi3/GpXKNmXZf4UkkRQtzSWoRK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bAaMGZ/btsQUktNTi3/GpXKNmXZf4UkkRQtzSWoRK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bAaMGZ/btsQUktNTi3/GpXKNmXZf4UkkRQtzSWoRK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbAaMGZ%2FbtsQUktNTi3%2FGpXKNmXZf4UkkRQtzSWoRK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;326&quot; height=&quot;376&quot; data-origin-width=&quot;387&quot; data-origin-height=&quot;446&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  원-핫 인코딩 (One-Hot Encoding)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;범주형 변수를 각각의 범주별로 새로운 열로 표현, 해당 범주에 해당하면 1, 아니면 0&lt;/li&gt;
&lt;li&gt;ex) 색상(&amp;lsquo;Red&amp;rsquo;, &amp;lsquo;Blue&amp;rsquo;, &amp;lsquo;Green&amp;rsquo;) &amp;rarr; &amp;lsquo;Red=1,Blue=0,Green=0&amp;rsquo; / &amp;lsquo;Red=0,Blue=1,Green=0&amp;rsquo; / &amp;hellip;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;장점 :&lt;/b&gt; 범주 간 서열 관계가 없을 때 사용하기 좋음&lt;/li&gt;
&lt;li&gt;&lt;b&gt;단점 :&lt;/b&gt; 범주가 매우 많으면 차원이 커짐&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;  pd.get_dummies()&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;ini&quot;&gt;&lt;code&gt;# 범주형 변수 변환 (원-핫 인코딩 예시)
df = pd.get_dummies(df, columns=['label'])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;719&quot; data-origin-height=&quot;442&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/JPhOw/btsQVGWYwes/ul6518KaxCTJ2CRgPKkgN0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/JPhOw/btsQVGWYwes/ul6518KaxCTJ2CRgPKkgN0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/JPhOw/btsQVGWYwes/ul6518KaxCTJ2CRgPKkgN0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FJPhOw%2FbtsQVGWYwes%2Ful6518KaxCTJ2CRgPKkgN0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;599&quot; height=&quot;368&quot; data-origin-width=&quot;719&quot; data-origin-height=&quot;442&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;pd.get_dummies(df, columns=[&quot;칼럼이름&quot;]) : 열의 범주들을 각각 별도 열로 만들어, 해당하는 행에는 1, 그렇지 않은 행에는 0 매핑&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;  &lt;b&gt;레이블 인코딩 (Label Encoding)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;범주를 숫자로 직접 맵핑(&amp;lsquo;M&amp;rsquo;=0, &amp;lsquo;L&amp;rsquo;=1, &amp;lsquo;XL&amp;rsquo;=2 등)&lt;/li&gt;
&lt;li&gt;단순하지만, 모델이 숫자의 크기를 서열 정보로 잘못 해석할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;  LabelEncoder()&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;capnproto&quot;&gt;&lt;code&gt;# 범주형 변수 변환 (레이블 인코딩 예시)
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
df[&quot;label&quot;] = encoder.fit_transform(df[&quot;label&quot;])
df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;378&quot; data-origin-height=&quot;436&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Cm5zh/btsQTmFuvwf/y4FgNx8Fm3j7yHLWN1adc0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Cm5zh/btsQTmFuvwf/y4FgNx8Fm3j7yHLWN1adc0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Cm5zh/btsQTmFuvwf/y4FgNx8Fm3j7yHLWN1adc0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FCm5zh%2FbtsQTmFuvwf%2Fy4FgNx8Fm3j7yHLWN1adc0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;334&quot; height=&quot;385&quot; data-origin-width=&quot;378&quot; data-origin-height=&quot;436&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #9feec3;&quot;&gt;&lt;b&gt;  피처 엔지니어링&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모델 성능 향상을 위해 기존 데이터를 변형, 조합해 새로운 특성(피처)를 만드는 작업&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;복잡한 데이터 구조 안에 존재하는 패턴을 효과적으로 추출해 모델이 쉽게 학습하게 함&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;  실습 예시&lt;/b&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 파생 변수 생성&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;날짜 파생 변수
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ex) 측정 시간이 &amp;lsquo;2025-02-24 10:35:00&amp;rsquo;이라면, &amp;lsquo;월(2)&amp;rsquo;, &amp;lsquo;요일(월=1)&amp;rsquo;, &amp;lsquo;시(10)&amp;rsquo;, &amp;lsquo;주말여부(0/1)&amp;rsquo; 등으로 분해&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;수치형 변수 조합
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ex) &amp;lsquo;온도&amp;rsquo;와 &amp;lsquo;습도&amp;rsquo;가 있을 때, 새로운 피처 &amp;lsquo;온도&amp;times;습도(TEMP&amp;times;HUMID)&amp;rsquo;를 추가 &amp;rarr; 두 변수의 상호작용이 불량 발생에 영향을 줄 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;로그 변환, 제곱근 변환 등
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 분포가 매우 치우친 변수(오른쪽 꼬리가 긴 경우)에 로그 변환을 적용하여 정규성에 가까워지도록 조정&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;2️⃣ 파생 변수 생성 코드 예시&lt;/b&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;import pandas as pd
import numpy as np

np.random.seed(42)  # 재현성을 위한 시드 고정

# 10개 데이터 샘플 생성
data_size = 10

# 날짜/시간 컬럼(예시)
dates = pd.date_range(start=&quot;2023-01-01&quot;, periods=data_size, freq='D')

# 온도(&amp;deg;C) : 15 ~ 35 사이 정수
temperature = np.random.randint(15, 36, size=data_size)

# 습도(%) : 30 ~ 90 사이 정수
humidity = np.random.randint(30, 91, size=data_size)

df = pd.DataFrame({
    'date': dates,
    'temperature': temperature,
    'humidity': humidity
})

df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;376&quot; data-origin-height=&quot;437&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bGo4lm/btsQUMwEMqv/Iz5R19hEk7PHaO3qu5yz3k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bGo4lm/btsQUMwEMqv/Iz5R19hEk7PHaO3qu5yz3k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bGo4lm/btsQUMwEMqv/Iz5R19hEk7PHaO3qu5yz3k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbGo4lm%2FbtsQUMwEMqv%2FIz5R19hEk7PHaO3qu5yz3k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;301&quot; height=&quot;350&quot; data-origin-width=&quot;376&quot; data-origin-height=&quot;437&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;pre class=&quot;prolog&quot;&gt;&lt;code&gt;# 피처 엔지니어링 (온도와 습도 간 상호작용)
df['temp_humid_interaction'] = df['temperature'] * df['humidity']
df
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;613&quot; data-origin-height=&quot;436&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/K1pJX/btsQVuiaAHB/i9Ggd1syS9OCiaAwoVUzRk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/K1pJX/btsQVuiaAHB/i9Ggd1syS9OCiaAwoVUzRk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/K1pJX/btsQVuiaAHB/i9Ggd1syS9OCiaAwoVUzRk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FK1pJX%2FbtsQVuiaAHB%2Fi9Ggd1syS9OCiaAwoVUzRk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;501&quot; height=&quot;356&quot; data-origin-width=&quot;613&quot; data-origin-height=&quot;436&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;3️⃣ 변수 선택 (Feature Selection)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;상관관계
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;두 변수 간 상관도가 높은 상황인 경우 다중공선성 의심.&lt;/li&gt;
&lt;li&gt;중복 정보가 클 수 있으므로, 하나만 남기거나 둘 다 제거 고려&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;VIF
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 어떤 변수 하나가, &lt;b&gt;다른 변수들과 얼마나 겹치는지(상관이 큰지)&lt;/b&gt; 수치로 보여주는 지표&lt;/li&gt;
&lt;li&gt;회귀분석에서 다중공선성 문제를 파악할 때 사용&lt;/li&gt;
&lt;li&gt;VIF가 일정 기준(예: 10 이상)을 넘으면 해당 변수를 제거하거나 비슷한 변수들을 &lt;b&gt;합치는(변환)&lt;/b&gt; 등의 방법으로 문제를 해결&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;모델 기반 중요도 (Feature Importance)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;트리 기반 모델(랜덤 포레스트, XGBoost 등)을 훈련 후 중요도가 낮은 변수를 제거&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;4️⃣ 변수 간 상호작용 추가&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;다항식 / 교차항 생성
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ex) 2차 다항식(Quadratic Features)&lt;/li&gt;
&lt;li&gt;제조 공정에서 온도, 압력, 속도 등이 곱해져야 비로소 의미가 생기는 경우가 많음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❓ &lt;b&gt;다중공선성(multicollinearity)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;회귀분석(집값 예측, 매출 예측 등)을 할 때, 여러 설명 변수(독립 변수)를 사용&lt;/li&gt;
&lt;li&gt;그런데 이 변수들이 서로 &lt;b&gt;너무 비슷한 정보를 담고 있어&lt;/b&gt; (즉, 서로 강하게 &lt;b&gt;상관&lt;/b&gt;이 있어) 모델이 헷갈리는 문제가 생김&lt;/li&gt;
&lt;li&gt;이런 다중공선성 문제는 회귀계수(모델 파라미터)의 의미 해석과 모델 안정성을 해침
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ex) 집 크기(㎡)와 방 개수가 거의 정비례한다면, 둘 다 넣었을 때 &lt;b&gt;겹치는&lt;/b&gt; 정보가 많아짐&lt;/li&gt;
&lt;li&gt;&quot;방 개수&quot;와 &quot;평수(㎡)&quot;라는 두 변수가 존재&lt;/li&gt;
&lt;li&gt;방이 5개면 평수도 대체로 넓고, 1개면 대체로 좁을거임 (둘은 &lt;b&gt;서로 높은 상관&lt;/b&gt; 관계).&lt;/li&gt;
&lt;li&gt;둘 다 회귀분석에 넣으면 모델 입장에서 &quot;&lt;b&gt;비슷한 정보가 두 번&lt;/b&gt; 들어온 셈&quot;이라, 어떤 변수가 집값에 &lt;b&gt;얼마나&lt;/b&gt; 영향을 주는지(독립적 기여도)를 &lt;b&gt;구분&lt;/b&gt;하기 어려워짐&lt;/li&gt;
&lt;li&gt;이런 경우, VIF가 &lt;b&gt;높게&lt;/b&gt; 나타난다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;</description>
      <category>Sparta/Theory</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/69</guid>
      <comments>https://junecho.tistory.com/69#entry69comment</comments>
      <pubDate>Mon, 29 Sep 2025 20:50:19 +0900</pubDate>
    </item>
    <item>
      <title>[250929] 머신러닝 01</title>
      <link>https://junecho.tistory.com/68</link>
      <description>&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;✅ 머신러닝 ?&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;컴퓨터가 &lt;b&gt;인간의 개입 없이(또는 최소한으로)&lt;/b&gt; &lt;b&gt;데이터를 학습하여&lt;/b&gt; &lt;b&gt;패턴을 찾아내고&lt;/b&gt;, 새로운 데이터에 대해 &lt;b&gt;예측&lt;/b&gt;이나 &lt;b&gt;분류&lt;/b&gt;를 수행하는 기술&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex)&amp;nbsp;스팸&amp;nbsp;메일&amp;nbsp;필터링,&amp;nbsp;이미지&amp;nbsp;분류,&amp;nbsp;음성&amp;nbsp;인식&amp;nbsp;등 &lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  머신러닝의 3대 요소&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;데이터 (Data)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 데이터가 참고하는 정보의 모음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;알고리즘 (Algorithm)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 문제를 해결하기 위해 &lt;b&gt;순서대로 처리하는 방법&lt;/b&gt;이나 &lt;b&gt;규칙&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;=모델&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;컴퓨팅 파워 (Computing Power)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 컴퓨터가 얼마나 빠르고 많이 일(연산)을 할 수 있는지를 나타내는 &lt;b&gt;능력치&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;딥러닝에서 중요&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  머신러닝, AI, 딥러닝&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;인공지능(AI)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;사람의 지능적인 작업을 기계가 수행하도록 만드는 광범위한 개념&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;머신러닝&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;AI를 실현하기 위한 방법 중 하나로, 데이터로부터 &lt;b&gt;특징&lt;/b&gt;이나 &lt;b&gt;규칙&lt;/b&gt;을 찾아내서 &lt;b&gt;학습 하는 것&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;ex) 스팸 메일에는 특정 단어나 형태가 자주 등장하는 공통점(패턴)이 있을 수 있는데 이를 자동으로 스팸으로 분류&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;딥러닝(Deep Learning)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;머신러닝의 하위 분야로, 사람의 뇌신경을 본 떠 만든 &lt;b&gt;인공신경망&lt;/b&gt;으로 이루어져 있음&lt;/li&gt;
&lt;li&gt;인공신경망을 &lt;b&gt;여러 겹&lt;/b&gt; 쌓아서 복잡한 정보를 학습할 수 있음&lt;/li&gt;
&lt;li&gt;ex) 오늘날 많이 유명한 모델들이 여기에 속함 : ChatGPT, 알파고, 알파스타, DALL-E&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;818&quot; data-origin-height=&quot;509&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bYsRl3/btsQWBneWob/3m9mPrn37iztoojx0OYch0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bYsRl3/btsQWBneWob/3m9mPrn37iztoojx0OYch0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bYsRl3/btsQWBneWob/3m9mPrn37iztoojx0OYch0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbYsRl3%2FbtsQWBneWob%2F3m9mPrn37iztoojx0OYch0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;582&quot; height=&quot;362&quot; data-origin-width=&quot;818&quot; data-origin-height=&quot;509&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  대량의 데이터 처리와 분석&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;현대 사회는 매순간 엄청난 양의 데이터를 생성
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;대규모 데이터를 &lt;b&gt;빠르고 정확&lt;/b&gt;하게 분석하여, 복잡한 상관관계를 발견하고 예측&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;기존 방식으로는 처리하기 어려웠던 &lt;b&gt;빅데이터&lt;/b&gt; 활용 가능
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ex) SNS에 쏟아지는 게시글, 대형 쇼핑몰의 상품 거래 기록 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;❓ 빅데이터&lt;/b&gt; &amp;rarr; 일반적인 방법으로는 &lt;b&gt;저장&amp;middot;분석하기 힘들 만큼&lt;/b&gt; 방대한 양의 데이터&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt; &lt;/b&gt; &lt;b&gt;머신러닝 vs 기존 통계 분석 &lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;통계 분석&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;가설 검증, 추론 (ex. &quot;이 변수와 저 변수 사이에 유의한 관계가 있는가?&quot;)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;주로 &quot;왜?&quot;라는 질문에 집중&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;머신러닝&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;예측&lt;/b&gt; (얼마나 정확하게 미래나 미지의 데이터를 예측할 수 있는가)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;&quot;얼마나 잘?&quot;에 집중&lt;/b&gt; (정확도, 재현율 등)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;✅ 머신러닝 종류&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  지도학습 (Supervised Learning)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우리가 맞다고 알고 있는 결과값을 정답값(레이블)이 있는 데이터를 학습하는 방식&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex) 고양이 사진에는 '고양이'라는 정답(레이블)을 붙여서, 컴퓨터가 어떤 이미지가 고양이인지 학습 가능&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;분류 (Classification)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: &lt;b&gt;어느 그룹에 속하는지를 결정&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;ex) 이메일이 스팸인지 아닌지, 은행 대출 상환 가능 여부&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;회귀 (Regression)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: &lt;b&gt;숫자로 된 결과를 예측&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;ex) 주택 가격 예측, 주가 예측&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt; &lt;/b&gt; &lt;b&gt;비지도학습 (Unsupervised Learning)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;레이블(정답값) 없이&lt;/b&gt; 데이터 패턴을 스스로 찾음&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;군집화 (Clustering)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 성향이 &lt;b&gt;비슷한 사람이나 사물을 자동으로 묶어내는&lt;/b&gt; 기법&lt;/li&gt;
&lt;li&gt;ex) 고객 군집 분석, 문서 토픽 분석&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;차원 축소 (Dimensionality Reduction)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 데이터의 특징(변수)이 너무 많아서 복잡한 데이터를**, 핵심 정보만 남기고 압축**하는 기법&lt;/li&gt;
&lt;li&gt;ex) 수백 가지 지표가 있는 데이터를 2~3개의 핵심 지표로 요약&lt;/li&gt;
&lt;li&gt;&amp;harr; 변수선택 : 변수를 일부만 택하는 것&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt; &lt;/b&gt; &lt;b&gt;강화학습(Reinforcement Learning)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;blockquote data-ke-style=&quot;style1&quot;&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;에이전트&lt;/b&gt;가 &lt;b&gt;환경&lt;/b&gt;과 상호작용하며 보상(Reward)을 최대화하도록 학습&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ 에이전트 : 학습을 수행하는 주인공, 게임으로 치면 플레이어, 로봇으로 치면 로봇 자체가 에이전트&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ 환경 : 에이전트가 움직이고 상호작용하는 무대&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ 보상 : 에이전트가 잘했을 때 얻는 점수(칭찬)나, 잘못했을 때 받는 벌점 같은 개념&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ex) 알파고(바둑), 로보틱스, 게임 AI&lt;/li&gt;
&lt;li&gt;시뮬레이션 환경에서 시도-오류를 반복하며 **가장 높은 보상을 보장해주는 행동 규칙(전략)**을 학습&lt;/li&gt;
&lt;li&gt;참고로 알고있기&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;✅ 머신러닝 모델링 프로세스&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;❗ 머신러닝 프로젝트는 단순히 모델만 잘 만든다고 끝나지 않음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;데이터 수집부터 배포&lt;/b&gt;까지 전체 흐름을 이해하는 것이 매우 중요&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  1. 데이터 수집&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;웹 크롤링, 센서 측정, 설문조사, DB 추출 등 다양한 방법&lt;/li&gt;
&lt;li&gt;&lt;b&gt;양질의 데이터 확보&lt;/b&gt;가 프로젝트의 성패를 좌우&lt;/li&gt;
&lt;li&gt;ex) 제조업에서는 공정 라인에 설치된 IoT 센서에서 데이터 지속 수집&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  2. 전처리 (Preprocessing)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;1️⃣ 결측치 처리&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 결측치란 데이터 표에서 일부 셀이 &lt;b&gt;비어 있는 상태&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;빈 칸을 &lt;b&gt;평균이나 가장 빈도가 높은 값&lt;/b&gt;으로 대신 채우거나, 필요하면 &lt;b&gt;빼고(삭제)&lt;/b&gt; 분석&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2️⃣ &lt;b&gt;이상치 처리&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 대부분의 데이터 범위에서 &lt;b&gt;심하게 벗어난 값을 해결&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;ex) 사람 몸무게 데이터가 대개 50~100kg인데, 500kg으로 기록된 경우는 &lt;b&gt;오타&lt;/b&gt; 등으로 생긴 이상치일 가능성이 높음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3️⃣ &lt;b&gt;스케일링&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 각각 다른 &lt;b&gt;단위&lt;/b&gt;를 쓰는 데이터(ex : 키는 cm, 몸무게는 kg)를 &lt;b&gt;비슷한 수준&lt;/b&gt;으로 맞춰주는 작업&lt;/li&gt;
&lt;li&gt;ex) 키는 150~180의 범위를 가지고 있고 몸무게는 50~100의 범위를 가지고 있어서 값의 크기가 다른데, 몸무게와 키 모두 0~1 범위로 바꾸면, 머신러닝 알고리즘이 두 값을 더 &lt;b&gt;공평하게&lt;/b&gt; 다룰 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4️⃣ &lt;b&gt;범주형 변환&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;: 글자로 된 정보를 &lt;b&gt;숫자로&lt;/b&gt; 바꿔주는 과정&lt;/li&gt;
&lt;li&gt;ex) 원-핫 인코딩, 레이블 인코딩 등&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ 원-핫 인코딩 이란?&lt;/b&gt; &amp;rarr; 해당 범주에 속하면 1, 아니면 0을 넣는 방식&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&amp;lsquo;빨강&amp;middot;초록&amp;middot;파랑&amp;rsquo;이라는 세 범주가 있으면&lt;/li&gt;
&lt;li&gt;빨강 = (1,0,0), 초록 = (0,1,0), 파랑 = (0,0,1)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;❓ 레이블 인코딩 예시&lt;/b&gt; &amp;rarr; 순서대로 숫자를 부여&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;M=0, L=1, XL=2 &amp;hellip;&lt;/li&gt;
&lt;li&gt;다만, 숫자에 &lt;b&gt;순위&lt;/b&gt; 의미가 생겨버릴 수 있어서 주의가 필요&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  3. 모델링 (Modeling)&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;지도학습의 경우 &lt;b&gt;분류/회귀&lt;/b&gt; 알고리즘 선택 (ex: 로지스틱 회귀, 랜덤 포레스트, XGBoost 등)&lt;/li&gt;
&lt;li&gt;비지도학습의 경우 &lt;b&gt;클러스터링/차원 축소&lt;/b&gt; 알고리즘 선택 (ex: K-Means, PCA 등)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;  4. 성능 평가 (Evaluation)&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;분류&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Accuracy, Precision, Recall, F1-score, ROC-AUC 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;회귀&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;MAE, RMSE, R&amp;sup2; 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;비지도(군집)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실루엣 계수 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;  정리&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #f6e199;&quot;&gt;&lt;b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h1&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;머신러닝 : 데이터에서 패턴 학습 &amp;rarr; 예측/분류 수행&lt;/li&gt;
&lt;li&gt;머신러닝, 딥러닝, AI : AI라는 큰 개념 안에 머신러닝, 그 안에 딥러닝&lt;/li&gt;
&lt;li&gt;머신러닝 vs 통계 : 예측 성능 vs 가설 검정&lt;/li&gt;
&lt;li&gt;머신러닝 학습 종류 : 지도학습, 비지도학습, 강화학습&lt;/li&gt;
&lt;li&gt;모델링 프로세스 : 데이터 수집 &amp;rarr; 전처리 &amp;rarr; 모델링 &amp;rarr; 평가 &amp;rarr; 최적화 &amp;rarr; 배포&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Sparta/Theory</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/68</guid>
      <comments>https://junecho.tistory.com/68#entry68comment</comments>
      <pubDate>Mon, 29 Sep 2025 20:33:12 +0900</pubDate>
    </item>
    <item>
      <title>[250926] 스파르타코딩 본캠프 39일차</title>
      <link>https://junecho.tistory.com/67</link>
      <description>&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #99cefa;&quot;&gt;&lt;b&gt; &amp;nbsp; CODEKATA&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어제부터 코드카타 목록에 없는 프로그래머스 SQL 문제 격파중&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;~ SQL 4페이지 까지 모든 문제 ✔ 완료&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어제는 너무 쉬운 문제들밖에 없어서 코드카타 따로 올릴게 없었음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오늘도 그렇긴 한데 함정 문제가 있어서 올려봄&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;오늘의 총 후기 : 문제를 꼼꼼히 잘 읽자...... (개억울함)&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;00)&amp;nbsp; &lt;a href=&quot;https://school.programmers.co.kr/learn/courses/30/lessons/131532&quot;&gt;년, 월, 성별 별 상품 구매 회원 수 구하기&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&quot;sql&quot;&gt;&lt;code&gt;SELECT 
    YEAR(s.sales_date) AS year, MONTH(s.sales_date) AS month, 
    i.gender, COUNT(DISTINCT s.user_id) AS users
FROM online_sale s LEFT JOIN user_info i ON s.user_id = i.user_id
WHERE i.gender = 0 OR i.gender = 1
GROUP BY YEAR(s.sales_date), MONTH(s.sales_date), i.gender
ORDER BY year, month, gender
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; ⭕&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 문제의 어이없는 점.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;GENDER&amp;nbsp;컬럼은 비어있거나 0 또는 1의 값을 가지며 0인 경우 남자를, 1인 경우는 여자를 나타냅니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;라고 했으면서, 답안은&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이때, 성별 정보가 없는 경우 결과에서 제외해주세요. &amp;lt;&amp;lt;&amp;lt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;저거 못보고 계속 아 왜 결과가 틀렸다고 하지 하고 다시 읽어보니까 결과에서 제외하래 ㅠ 그럼 NULL값이 남자라고 알려주지 말던가~~~~ 개낚임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;div data-ke-type=&quot;moreLess&quot; data-text-more=&quot;더보기&quot; data-text-less=&quot;닫기&quot;&gt;&lt;a class=&quot;btn-toggle-moreless&quot;&gt;더보기&lt;/a&gt;
&lt;div class=&quot;moreless-content&quot;&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;QCC&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1번&lt;/p&gt;
&lt;pre class=&quot;n1ql&quot;&gt;&lt;code&gt;SELECT COUNT(business_entity_id) AS customer_count
FROM person
WHERE email_promotion = 1 OR email_promotion = 2
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아&amp;nbsp;문제&amp;nbsp;대충&amp;nbsp;안읽어서&amp;nbsp;틀림&amp;nbsp;ㄹㅈㄷㅋㅋ &lt;br /&gt;개인(소매)&amp;nbsp;&amp;lt;&amp;lt;&amp;nbsp;고객의&amp;nbsp;수&amp;nbsp;였구나&amp;nbsp;미친&amp;nbsp;~~~개웃김&amp;nbsp;ㅠ &lt;br /&gt;WHERE&amp;nbsp;person_type&amp;nbsp;=&amp;nbsp;&quot;IN&quot;&amp;nbsp;넣게&amp;nbsp;해주세요&amp;nbsp;젠장ㅋㅋ&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;문제 대충 읽어서 2번 맞고 1번틀린 바보 저에요&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;caret-color: auto; letter-spacing: 0px;&quot;&gt;2번&lt;/span&gt;&lt;/p&gt;
&lt;pre class=&quot;sql&quot;&gt;&lt;code&gt;-- 2011-10 동안 / 취소 안된 거래 / 총 주문 수량
WITH cnt AS (
  SELECT sh.customer_id, SUM(sd.order_qty) AS cnt
  FROM sales_order_header sh LEFT JOIN sales_order_detail sd ON sh.sales_order_id = sd.sales_order_id
  WHERE sh.order_date LIKE &quot;2011-10%&quot; AND sh.status != 6
  GROUP BY sh.customer_id
),
-- 총 주문수량 70이상 &amp;amp; 고객 이름 찾기 위한 join 
sum70 AS (
  SELECT cnt.customer_id, cnt.cnt, cus.person_id
  FROM cnt cnt LEFT JOIN sales_customer cus ON cnt.customer_id = cus.customer_id
  WHERE cnt &amp;gt;= 70
)
  
SELECT s.customer_id, p.first_name, p.last_name, s.cnt AS total_quantity
FROM sum70 s LEFT JOIN person p ON s.person_id = p.business_entity_id
ORDER BY s.customer_id
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;와 2번 제출하고 자리 와가지고 다시 코드 보는데, 총 수량 컬럼 total_quantity 으로 이름 안바꿔놔서 개 식겁함 ㄷㄷㄷㄷ&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그래서 수정하고 다시 제출했더니 재제출 돼서 다행이었순 휴~~~&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p style=&quot;color: #222222; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Sparta/CODEKATA</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/67</guid>
      <comments>https://junecho.tistory.com/67#entry67comment</comments>
      <pubDate>Fri, 26 Sep 2025 15:51:48 +0900</pubDate>
    </item>
    <item>
      <title>[250925] 스파르타코딩 본캠프 38일차</title>
      <link>https://junecho.tistory.com/66</link>
      <description>&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;b&gt;✅ 통계 라이브 세션 정리&lt;/b&gt;&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://www.notion.so/250917-32-i_hate_statistics-01-271c9c8dda7e806c877ac817bda5375a?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://www.notion.so/250917-32-i_hate_statistics-01-271c9c8dda7e806c877ac817bda5375a?source=copy_link&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758805683330&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[250917] 32일차 - i_hate_statistics 01 | Notion&quot; data-og-description=&quot;✅ 통계학 2가지 유형&quot; data-og-host=&quot;www.notion.so&quot; data-og-source-url=&quot;https://www.notion.so/250917-32-i_hate_statistics-01-271c9c8dda7e806c877ac817bda5375a?source=copy_link&quot; data-og-url=&quot;https://www.notion.so/250917-32-i_hate_statistics-01-271c9c8dda7e806c877ac817bda5375a&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/dN1UGh/hyZJU6d1m9/5cQD6YnLtQ1AxkYHbbVHO0/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630,https://scrap.kakaocdn.net/dn/bi8tiK/hyZKcynjQI/Tepf6g4kN862ykhMOeDQJk/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630&quot;&gt;&lt;a href=&quot;https://www.notion.so/250917-32-i_hate_statistics-01-271c9c8dda7e806c877ac817bda5375a?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://www.notion.so/250917-32-i_hate_statistics-01-271c9c8dda7e806c877ac817bda5375a?source=copy_link&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/dN1UGh/hyZJU6d1m9/5cQD6YnLtQ1AxkYHbbVHO0/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630,https://scrap.kakaocdn.net/dn/bi8tiK/hyZKcynjQI/Tepf6g4kN862ykhMOeDQJk/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[250917] 32일차 - i_hate_statistics 01 | Notion&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;✅ 통계학 2가지 유형&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;www.notion.so&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://www.notion.so/250918-33-i_hate_statistics-02-273c9c8dda7e802a83d0c1fdbe363df9?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://www.notion.so/250918-33-i_hate_statistics-02-273c9c8dda7e802a83d0c1fdbe363df9?source=copy_link&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758805685560&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[250918] 33일차 - i_hate_statistics 02 | Notion&quot; data-og-description=&quot;✅ 대표값&quot; data-og-host=&quot;www.notion.so&quot; data-og-source-url=&quot;https://www.notion.so/250918-33-i_hate_statistics-02-273c9c8dda7e802a83d0c1fdbe363df9?source=copy_link&quot; data-og-url=&quot;https://www.notion.so/250918-33-i_hate_statistics-02-273c9c8dda7e802a83d0c1fdbe363df9&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/ceC3yg/hyZJ0yBe6n/x00xqgnpWIakaTvzOUnXb0/img.png?width=458&amp;amp;height=309&amp;amp;face=0_0_458_309,https://scrap.kakaocdn.net/dn/VA9XJ/hyZJQbHwDx/m2UHKu3KaawkYSQSo5GF70/img.png?width=458&amp;amp;height=309&amp;amp;face=0_0_458_309&quot;&gt;&lt;a href=&quot;https://www.notion.so/250918-33-i_hate_statistics-02-273c9c8dda7e802a83d0c1fdbe363df9?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://www.notion.so/250918-33-i_hate_statistics-02-273c9c8dda7e802a83d0c1fdbe363df9?source=copy_link&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/ceC3yg/hyZJ0yBe6n/x00xqgnpWIakaTvzOUnXb0/img.png?width=458&amp;amp;height=309&amp;amp;face=0_0_458_309,https://scrap.kakaocdn.net/dn/VA9XJ/hyZJQbHwDx/m2UHKu3KaawkYSQSo5GF70/img.png?width=458&amp;amp;height=309&amp;amp;face=0_0_458_309');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[250918] 33일차 - i_hate_statistics 02 | Notion&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;✅ 대표값&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;www.notion.so&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://www.notion.so/250919-34-i_hate_statistics-03-273c9c8dda7e8030b316dcea1b23f3c2?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://www.notion.so/250919-34-i_hate_statistics-03-273c9c8dda7e8030b316dcea1b23f3c2?source=copy_link&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758805696441&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[250919] 34일차 - i_hate_statistics 03 | Notion&quot; data-og-description=&quot;✅ 확률&quot; data-og-host=&quot;www.notion.so&quot; data-og-source-url=&quot;https://www.notion.so/250919-34-i_hate_statistics-03-273c9c8dda7e8030b316dcea1b23f3c2?source=copy_link&quot; data-og-url=&quot;https://www.notion.so/250919-34-i_hate_statistics-03-273c9c8dda7e8030b316dcea1b23f3c2&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/Ds4Rj/hyZJVD3qhe/GBGui8yOFBdZzbnbfCaFMK/img.png?width=1128&amp;amp;height=676&amp;amp;face=0_0_1128_676,https://scrap.kakaocdn.net/dn/foxma/hyZJTsG66k/C9m3qHK3hU2QZAHkewK7a1/img.png?width=1128&amp;amp;height=676&amp;amp;face=0_0_1128_676&quot;&gt;&lt;a href=&quot;https://www.notion.so/250919-34-i_hate_statistics-03-273c9c8dda7e8030b316dcea1b23f3c2?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://www.notion.so/250919-34-i_hate_statistics-03-273c9c8dda7e8030b316dcea1b23f3c2?source=copy_link&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/Ds4Rj/hyZJVD3qhe/GBGui8yOFBdZzbnbfCaFMK/img.png?width=1128&amp;amp;height=676&amp;amp;face=0_0_1128_676,https://scrap.kakaocdn.net/dn/foxma/hyZJTsG66k/C9m3qHK3hU2QZAHkewK7a1/img.png?width=1128&amp;amp;height=676&amp;amp;face=0_0_1128_676');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[250919] 34일차 - i_hate_statistics 03 | Notion&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;✅ 확률&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;www.notion.so&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://www.notion.so/250922-35-i_hate_statistics-04-276c9c8dda7e80259b4fcf233e374146?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://www.notion.so/250922-35-i_hate_statistics-04-276c9c8dda7e80259b4fcf233e374146?source=copy_link&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758805703587&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[250922] 35일차 - i_hate_statistics 04 | Notion&quot; data-og-description=&quot;모집단 전체를 다보는 건 불가능 &amp;rarr; 표본만 관찰&quot; data-og-host=&quot;www.notion.so&quot; data-og-source-url=&quot;https://www.notion.so/250922-35-i_hate_statistics-04-276c9c8dda7e80259b4fcf233e374146?source=copy_link&quot; data-og-url=&quot;https://www.notion.so/250922-35-i_hate_statistics-04-276c9c8dda7e80259b4fcf233e374146&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/NCUyf/hyZJEDgtJ3/Ha0nTm1XuSor81n8pRhKp1/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630,https://scrap.kakaocdn.net/dn/cBRfuY/hyZJFvoFPB/8FxfR6qF4iklnmMCQwt6Mk/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630&quot;&gt;&lt;a href=&quot;https://www.notion.so/250922-35-i_hate_statistics-04-276c9c8dda7e80259b4fcf233e374146?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://www.notion.so/250922-35-i_hate_statistics-04-276c9c8dda7e80259b4fcf233e374146?source=copy_link&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/NCUyf/hyZJEDgtJ3/Ha0nTm1XuSor81n8pRhKp1/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630,https://scrap.kakaocdn.net/dn/cBRfuY/hyZJFvoFPB/8FxfR6qF4iklnmMCQwt6Mk/img.png?width=1200&amp;amp;height=630&amp;amp;face=0_0_1200_630');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[250922] 35일차 - i_hate_statistics 04 | Notion&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;모집단 전체를 다보는 건 불가능 &amp;rarr; 표본만 관찰&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;www.notion.so&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://www.notion.so/250924-37-i_hate_statistics-05-278c9c8dda7e808fb21bcbe95f8d5efa?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://www.notion.so/250924-37-i_hate_statistics-05-278c9c8dda7e808fb21bcbe95f8d5efa?source=copy_link&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1758805713381&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[250924] 37일차 - i_hate_statistics 05 | Notion&quot; data-og-description=&quot;✅ 상관&quot; data-og-host=&quot;www.notion.so&quot; data-og-source-url=&quot;https://www.notion.so/250924-37-i_hate_statistics-05-278c9c8dda7e808fb21bcbe95f8d5efa?source=copy_link&quot; data-og-url=&quot;https://www.notion.so/250924-37-i_hate_statistics-05-278c9c8dda7e808fb21bcbe95f8d5efa&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/BN2XA/hyZJHUhRAY/eVSpj7JwUU7axIYwHk8iDK/img.png?width=610&amp;amp;height=374&amp;amp;face=0_0_610_374,https://scrap.kakaocdn.net/dn/EBCJR/hyZJXPnUh1/kS6yskjY0eGyOkAt2YlAn0/img.png?width=610&amp;amp;height=374&amp;amp;face=0_0_610_374&quot;&gt;&lt;a href=&quot;https://www.notion.so/250924-37-i_hate_statistics-05-278c9c8dda7e808fb21bcbe95f8d5efa?source=copy_link&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://www.notion.so/250924-37-i_hate_statistics-05-278c9c8dda7e808fb21bcbe95f8d5efa?source=copy_link&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/BN2XA/hyZJHUhRAY/eVSpj7JwUU7axIYwHk8iDK/img.png?width=610&amp;amp;height=374&amp;amp;face=0_0_610_374,https://scrap.kakaocdn.net/dn/EBCJR/hyZJXPnUh1/kS6yskjY0eGyOkAt2YlAn0/img.png?width=610&amp;amp;height=374&amp;amp;face=0_0_610_374');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[250924] 37일차 - i_hate_statistics 05 | Notion&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;✅ 상관&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;www.notion.so&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Sparta/Theory</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/66</guid>
      <comments>https://junecho.tistory.com/66#entry66comment</comments>
      <pubDate>Thu, 25 Sep 2025 22:08:39 +0900</pubDate>
    </item>
    <item>
      <title>[250924] 스파르타코딩 본캠프 37일차</title>
      <link>https://junecho.tistory.com/65</link>
      <description>&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;&lt;span style=&quot;background-color: #99cefa;&quot;&gt;&lt;b&gt; &amp;nbsp; CODEKATA&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p style=&quot;color: #222222; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;~ 105문제&lt;/p&gt;
&lt;p style=&quot;color: #222222; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #222222; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #222222; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;101)&amp;nbsp; &lt;a href=&quot;https://leetcode.com/problems/product-sales-analysis-iii/description/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Product Sales Analysis III&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&quot;sql&quot;&gt;&lt;code&gt;# 1차 코드
WITH first AS (
    SELECT product_id, MIN(year) AS first_year
    FROM sales
    GROUP BY product_id
)
SELECT f.product_id, f.first_year, s.quantity, s.price
FROM first f JOIN sales s ON f.product_id = s.product_id AND f.first_year = s.year
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; ⭕&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어제 풀었던 문제랑 비슷한 양상인데 WITH 안쓰고 어떻게 해보려다가 기억이 안나서 그냥 WITH씀&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;효율 좋은건 역시나 WITH 안쓰는 쿼리였다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;WITH 안쓰면서 다시 짜보자&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어제 풀은 문제의 효율 1등 코드를 참고해서 짰음&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;# 2차 코드
SELECT a.product_id, a.first_year, s.quantity, s.price
FROM (
    SELECT product_id, MIN(year) AS first_year
    FROM sales
    GROUP BY product_id
) a LEFT JOIN sales s ON a.product_id = s.product_id AND a.first_year = s.year
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; ⭕&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;굿&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;FROM 절에다가 또 서브쿼리 쓸바에 WITH로 항상 뺐기 때문에 FROM에다가 박기는 좀 낯설음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;FROM 절에다가 서브쿼리 박는게 효율이 쥐똥만큼 더 좋긴 한데,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;솔직히 가독성 좋은건 WITH지 않나 ㅎ? (WITH교 신도의 의견)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;105) &amp;nbsp;&lt;a href=&quot;https://leetcode.com/problems/customers-who-bought-all-products/description/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Customers Who Bought All Products&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&quot;routeros&quot;&gt;&lt;code&gt;SELECT c.customer_id
FROM customer c LEFT JOIN product p ON c.product_key = p.product_key
GROUP BY c.customer_id
HAVING COUNT(DISTINCT c.product_key) = (SELECT COUNT(DISTINCT product_key) FROM product)
&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rArr; ⭕&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오 딱 HAVING에 서브쿼리만 쓸 수 있으면 바로 풀리는데&amp;hellip;! 하면서 검색해봤더니 ㄹㅇ 쓸 수 있었음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;처음에 c.product_key에도 DISTINCT 안먹이고 했다가 서버 테스트에서 틀렸다고 하길래,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;테스트케이스로 가져와서 하나하나 출력해보니까 product_key를 같은 걸 여러번 사는 경우도 있었음!&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그래서 DISTINCT 먹이고 해결 완료&lt;/p&gt;</description>
      <category>Sparta/CODEKATA</category>
      <author>junecho</author>
      <guid isPermaLink="true">https://junecho.tistory.com/65</guid>
      <comments>https://junecho.tistory.com/65#entry65comment</comments>
      <pubDate>Wed, 24 Sep 2025 16:06:06 +0900</pubDate>
    </item>
  </channel>
</rss>