feat: tek birleştirilmiş JSON yapısına geçiş + sosyal kanıt fallback

Ne yaptık:
- data_consolidator.py: Tüm normalizasyon ve hesaplama mantığını main.py'den çıkardık
- Dashboard endpoint 1150 satırdan 25 satıra düştü (main.py -1730/+1880 net)
- Enrichment bitince otomatik konsolide dosya oluşturuluyor (report_{id}_data.json)
- Eski raporlar ilk dashboard isteğinde lazy migration ile konsolide ediliyor
- Trendyol API artık order-count döndürmediği için baskets fallback eklendi
- Inline socialProofs (scrape) > enrichment API öncelik sırası uygulandı
- Frontend KPI başlıkları orders/baskets durumuna göre dinamik değişiyor
- logging_config.py, category_seeder.py, alembic migration eklendi
- Playwright ile 9 tab test edildi, tüm veriler doğru

Neden yaptık:
- 3 farklı kaynaktan her istekte birleştirme yapılması veri tutarsızlığına ve yavaşlığa yol açıyordu
- Tek konsolide JSON dosyası ile dashboard anında yükleniyor
- Trendyol API değişikliği nedeniyle sipariş verisi kayboluyordu, baskets fallback ile çözüldü

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
furkanyigit34
2026-03-28 22:25:25 +03:00
parent 187c59ec9b
commit ce1dc1e25f
15 changed files with 1878 additions and 1459 deletions

137
CLAUDE.md
View File

@@ -1,12 +1,12 @@
# CLAUDE.md
Bu dosya Claude Code (claude.ai/code) için proje rehberidir.
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Proje Özeti
**Trendyol Product Dashboard**: Trendyol e-ticaret platformu için kategori bazlı ürün analiz sistemi. 7 tab'lı dashboard, otomatik rapor oluşturma ve sosyal kanıt metrikleri.
**Trendyol Product Dashboard**: Trendyol e-ticaret platformu için kategori bazlı ürün analiz sistemi. 9 tab'lı dashboard, otomatik rapor oluşturma, sosyal kanıt metrikleri ve hidden champion analizi.
**Stack**: FastAPI + React 19 + Vite + SQLite + Tailwind CSS
**Stack**: FastAPI + React 19 + Vite + PostgreSQL + Tailwind CSS
## Geliştirme Komutları
@@ -15,17 +15,33 @@ Bu dosya Claude Code (claude.ai/code) için proje rehberidir.
python3 start.py
# Manuel başlatma (iki terminal)
cd backend && python3 main.py # Terminal 1 - Backend
cd admin-panel && npm run dev # Terminal 2 - Frontend
cd backend && python3 main.py # Terminal 1 - Backend (port 8001)
cd admin-panel && npm run dev # Terminal 2 - Frontend (port 5173)
# Dependency kurulumu
cd backend && pip install -r requirements.txt # Python
cd admin-panel && npm install # Node.js
# Diğer komutlar
cd admin-panel && npm run build # Frontend build
cd admin-panel && npm run lint # Lint
cd backend && python3 -c "from database import init_db; init_db()" # DB init
# Build & lint
cd admin-panel && npm run build # Frontend production build
cd admin-panel && npm run lint # ESLint
# Backend testler
cd backend && pytest # Tüm testler
cd backend && pytest tests/test_cache.py # Tek test dosyası
cd backend && pytest tests/test_cache.py -k "test_ttl" # Tek test
# Frontend E2E testler (Playwright)
cd admin-panel && npx playwright test # Tüm E2E testler
cd admin-panel && npx playwright test tests/rare-keywords.spec.js # Tek spec
# Docker ile çalıştırma
./build-docker.sh && ./start-docker.sh # Build + start
./stop-docker.sh # Durdur
# DB migration
cd backend && alembic upgrade head # Migration uygula
cd backend && alembic revision --autogenerate -m "description" # Yeni migration
```
**Erişim URL'leri**:
@@ -39,23 +55,36 @@ cd backend && python3 -c "from database import init_db; init_db()" # DB init
### 3 Katmanlı Yapı
```
React Frontend (admin-panel/) → FastAPI Backend (backend/) → SQLite + JSON
├── CategoryManagement.jsx ├── main.py (~4400 satır) ├── trendyol.db
├── ReportGeneration.jsx ├── database.py ├── categories/*.json
├── ReportList.jsx ── scraper.py └── reports/*.json
── ReportDashboard.jsx (7 tab)
React Frontend (admin-panel/) → FastAPI Backend (backend/) PostgreSQL + JSON
├── ReportDashboard.jsx (9 tab) ├── main.py (~5000 satır) ├── trendyol_db
├── ReportGeneration.jsx ├── database.py (ORM) ├── categories/*.json
├── ReportList.jsx ── scraper.py └── reports/*.json
── ReportComparison.jsx ├── google_trends_helper.py
└── CategoryManagement.jsx └── analytics/
├── metrics.py
└── champion_finder.py
```
### Dashboard Tab'ları (7 adet)
### Frontend Routes
| Path | Component | Açıklama |
|------|-----------|----------|
| `/` veya `/report` | ReportGeneration | Yeni rapor oluştur |
| `/reports` | ReportList | Kayıtlı raporlar |
| `/reports/:reportId` | ReportDashboard | 9 tab'lı analiz dashboard |
| `/compare` | ReportComparison | Yan yana rapor karşılaştırma |
### Dashboard Tab'ları (9 adet)
| Tab ID | Tab Adı | Component | Açıklama |
|--------|---------|-----------|----------|
| overview | Genel Bakış | OverviewTab | KPI'lar, özet grafikler |
| brand | Marka | BrandTab | Marka analizi, pazar payı |
| category | Kategori | CategoryTab | Kategori dağılımı |
| origin | Menşei | OriginTab | Ülke bazlı analiz |
| barcode | Barkod | BarcodeTab | Barkod veri analizi |
| keyword | Keyword Aracı | KeywordTab | Anahtar kelime analizi |
| barcode | Barkod | BarcodeTab | Barkod/GS1 menşei analizi |
| keyword | Keyword Aracı | KeywordTab | Anahtar kelime + Google Trends |
| product-finder | Ürün Bulma | ProductFinderTab | Ürün arama/filtreleme |
| hidden-champions | Gizli Şampiyonlar | HiddenChampionsTab | Düşük yorum, yüksek puan fırsatları |
| opportunity | Fırsat Analizi | OpportunityTab | Pazar fırsat analizi |
### Veri Akışı
@@ -77,12 +106,12 @@ React Frontend (admin-panel/) → FastAPI Backend (backend/) → SQLite +
**Backend'den gelen hazır objeleri kullan, ham hesaplama YAPMA:**
```jsx
// DOĞRU - Hazır veriyi kullan
// DOĞRU - Hazır veriyi kullan
const kpis = dashboardData?.kpis || {};
const topProducts = dashboardData?.charts?.top_products || [];
const topBrands = dashboardData?.charts?.top_brands || [];
// YANLIŞ - all_products'tan hesaplama yapma
// YANLIŞ - all_products'tan hesaplama yapma
const total = dashboardData?.all_products.reduce((sum, p) => sum + p.price, 0);
```
@@ -97,12 +126,11 @@ Frontend hesaplamalı veri, alan adı uyumsuzluğuna yol açabilir. Detay için:
**Çözüm Pattern - Mapping Layer**:
```jsx
// Veriyi component beklentilerine dönüştür
const transformed = sourceData.map(item => ({
country: item.name, // Beklenen alana map'le
name: item.name, // Orijinali koru
count: item.productCount, // Beklenen alana map'le
productCount: item.productCount // Orijinali koru
country: item.name,
name: item.name,
count: item.productCount,
productCount: item.productCount
}));
```
@@ -111,7 +139,7 @@ const transformed = sourceData.map(item => ({
1. Tab config'i `src/constants/tabGroups.js`'e ekle
2. Tab component'ini `src/components/dashboard-tabs/` altına oluştur
3. `ReportDashboard.jsx`'te import et ve render bloğu ekle
4. **Her zaman veri dönüşümü için console.log ekle**
4. Gerekiyorsa backend'e yeni endpoint ekle (`main.py`)
## API Entegrasyonu
@@ -123,15 +151,10 @@ const transformed = sourceData.map(item => ({
| ENRICHMENT | 120s | Sosyal kanıt zenginleştirme |
| KEYWORD_ANALYSIS | 300s | Keyword analizi |
### Polling Pattern
```jsx
// Exponential backoff with jitter (1s → 5s max)
import { fetchWithTimeout, API_BASE_URL } from '../config/api';
```
### Rate Limit
- Sosyal kanıt API: 2 istek/saniye
- Exponential backoff kullanılır (%75 istek azaltımı sağlandı)
### Rate Limit & Resilience
- Sosyal kanıt API: 2 istek/saniye (RateLimiter)
- Circuit breaker pattern for external API calls
- Exponential backoff with jitter (1s → 5s max)
## Kod Değişiklik Kuralları
@@ -141,18 +164,45 @@ import { fetchWithTimeout, API_BASE_URL } from '../config/api';
- Uzun işlemler: BackgroundTasks + progress polling endpoint
- Harici API çağrıları: Her zaman timeout parametresi ekle
- Cache: BoundedCache kullan (asla sınırsız dict kullanma)
- Analytics hesaplamaları: `analytics/` modülüne koy (metrics.py, champion_finder.py)
### Frontend
- `fetchWithTimeout` kullan (`src/config/api.js`'den)
- Async işlemler için loading state göster
- Eşzamanlı çağrılar için request deduplication uygula
- Grafikler: Recharts kullan, veri dönüşümü `utils/chartTransformers.js`'de
- Export: `utils/exportUtils.js` ile CSV/Excel
### CORS Değişiklikleri
Yeni frontend portları için `main.py`'deki CORS allowlist'e ekle (satır 34-45):
Yeni frontend portları için `main.py`'deki CORS allowlist'e ekle:
```python
allow_origins=["http://localhost:5173", "http://localhost:5174", ...]
```
## Database
**Dev**: `postgresql://postgres:trendyol123@localhost:5433/trendyol_db`
**Docker**: `postgresql://postgres:trendyol123@postgres:5432/trendyol_db`
Migrations: Alembic (`backend/alembic/`). Her schema değişikliğinde `alembic revision --autogenerate` çalıştır.
| Model | Amaç | Anahtar Alanlar |
|-------|------|-----------------|
| Category | Hiyerarşik kategori ağacı | `parent_id` (self-ref), `trendyol_category_id` |
| Snapshot | Aylık veri görüntüleri | `category_id`, `json_file_path` |
| Report | Kayıtlı raporlar | `category_id`, `json_file_path` |
| EnrichmentError | API hata logları | `endpoint`, `error_type`, `status_code` |
## Deployment
**Platform**: Coolify + Docker Compose + Traefik reverse proxy
Docker Compose servisleri: `postgres` (15-alpine), `backend` (FastAPI), `frontend` (Nginx)
`startup.sh` sırası: PostgreSQL bağlantı bekle → Alembic migration → Kategori seeding → Uvicorn başlat
Traefik SSE streaming desteği: 100ms flush interval (rapor progress için)
## Kaynak Limitleri
| Kaynak | Limit |
@@ -163,26 +213,11 @@ allow_origins=["http://localhost:5173", "http://localhost:5174", ...]
| Sosyal kanıt batch | 5 ürün/istek |
| Rate limit | 2 istek/saniye (sosyal kanıt) |
## Kritik Dependency'ler
**Backend**: FastAPI 0.104.1, SQLAlchemy 2.0.45, Uvicorn 0.24.0, Requests 2.31.0, Pytrends 4.9.2
**Frontend**: React 19.2.0, Vite 7.2.2, Recharts 3.4.1, Tailwind CSS 4.1.17, Axios 1.13.2
## Database Modelleri
| Model | Amaç | Anahtar Alanlar |
|-------|------|-----------------|
| Category | Hiyerarşik kategori ağacı | `parent_id` (self-ref), `trendyol_category_id` |
| Snapshot | Aylık veri görüntüleri | `category_id`, `json_file_path` |
| Report | Kayıtlı raporlar | `category_id`, `json_file_path` |
| EnrichmentError | API hata logları | `endpoint`, `error_type`, `status_code` |
## Dokümantasyon
| Dosya | Amaç |
|-------|------|
| docs/DASHBOARD_ARCHITECTURE.md | **Önemli** - Dashboard veri yapıları |
| docs/DASHBOARD_ARCHITECTURE.md | Dashboard veri yapıları ve KPI tanımları |
| docs/bug-fixes/ORIGINTAB_BUG_FIX.md | **Kritik** - Alan adı uyumsuzluk pattern'i |
| docs/API_DOCUMENTATION.md | Tam API referansı |
| docs/ARCHITECTURE.md | Sistem mimarisi (Türkçe) |

View File

@@ -99,17 +99,27 @@ function ReportDashboard() {
const products = dashboardData.all_products
const totalProducts = products.length
const totalOrders = products.reduce((sum, p) => sum + (p.orders || 0), 0)
const rawOrders = products.reduce((sum, p) => sum + (p.orders || 0), 0)
const totalBaskets = products.reduce((sum, p) => sum + (p.baskets || 0), 0)
// Trendyol API artık order-count döndürmüyor — orders > 0 ise onu, yoksa baskets'ı kullan
const totalOrders = rawOrders > 0 ? rawOrders : totalBaskets
const ordersLabel = rawOrders > 0 ? 'orders' : 'baskets'
const totalViews = products.reduce((sum, p) => sum + (p.page_views || 0), 0)
const totalFavorites = products.reduce((sum, p) => sum + (p.favorites || 0), 0)
const avgPrice = products.reduce((sum, p) => sum + (p.price || 0), 0) / totalProducts
const totalRevenue = products.reduce((sum, p) => sum + ((p.price || 0) * (p.orders || 0)), 0)
const totalRevenue = rawOrders > 0
? products.reduce((sum, p) => sum + ((p.price || 0) * (p.orders || 0)), 0)
: products.reduce((sum, p) => sum + ((p.price || 0) * (p.baskets || 0)), 0)
const kpis = {
totalProducts,
totalOrders,
totalBaskets,
totalViews,
totalFavorites,
avgPrice: Math.round(avgPrice),
totalRevenue: Math.round(totalRevenue)
totalRevenue: Math.round(totalRevenue),
ordersLabel
}
console.log('✅ [KPI] Calculated KPIs:', kpis)

View File

@@ -12,8 +12,8 @@ export default function HiddenChampionsTab({ reportId }) {
// Filters
const [minRating, setMinRating] = useState(4.0)
const [maxReview, setMaxReview] = useState(100)
const [minOrders, setMinOrders] = useState(5)
const [sortKey, setSortKey] = useState('performance_score')
const [minOrders, setMinOrders] = useState(0)
const [sortKey, setSortKey] = useState('hidden_champion_score')
const [sortDir, setSortDir] = useState('desc')
const [showFilters, setShowFilters] = useState(false)
@@ -41,9 +41,9 @@ export default function HiddenChampionsTab({ reportId }) {
// Filtered & sorted products
const filteredProducts = useMemo(() => {
if (!data?.products) return []
if (!data?.hidden_champions) return []
return data.products
return data.hidden_champions
.filter(p => {
const rating = p.rating || 0
const reviewCount = p.review_count || p.reviewCount || 0
@@ -230,10 +230,10 @@ export default function HiddenChampionsTab({ reportId }) {
</th>
<th
className="text-right px-4 py-3 font-medium text-slate-500 cursor-pointer hover:text-slate-700"
onClick={() => handleSort('performance_score')}
onClick={() => handleSort('hidden_champion_score')}
>
<div className="flex items-center justify-end gap-1">
Skor <SortIcon column="performance_score" />
Skor <SortIcon column="hidden_champion_score" />
</div>
</th>
</tr>
@@ -287,13 +287,13 @@ export default function HiddenChampionsTab({ reportId }) {
</td>
<td className="px-4 py-3 text-right">
<span className={`inline-flex items-center px-2 py-0.5 rounded-full text-xs font-bold ${
(product.performance_score || 0) >= 70
(product.hidden_champion_score || 0) >= 70
? 'bg-emerald-100 text-emerald-700'
: (product.performance_score || 0) >= 40
: (product.hidden_champion_score || 0) >= 40
? 'bg-amber-100 text-amber-700'
: 'bg-slate-100 text-slate-600'
}`}>
{(product.performance_score || 0).toFixed(0)}
{(product.hidden_champion_score || 0).toFixed(0)}
</span>
</td>
</tr>

View File

@@ -90,21 +90,21 @@ export default function OverviewTab({
? (sortedPrices[sortedPrices.length / 2 - 1] + sortedPrices[sortedPrices.length / 2]) / 2
: sortedPrices[Math.floor(sortedPrices.length / 2)]
const bucketCount = 10
const range = max - min || 1
const bucketSize = range / bucketCount
// Use predefined price ranges for meaningful distribution
const ranges = [
[0, 50], [50, 100], [100, 200], [200, 500],
[500, 1000], [1000, 2000], [2000, 5000], [5000, 10000], [10000, Infinity]
]
const buckets = Array.from({ length: bucketCount }, (_, i) => ({
range: `${Math.round(min + i * bucketSize)}-${Math.round(min + (i + 1) * bucketSize)}`,
min: min + i * bucketSize,
max: min + (i + 1) * bucketSize,
count: 0
// Filter out empty ranges and build buckets
const buckets = ranges
.map(([lo, hi]) => ({
range: hi === Infinity ? `${lo.toLocaleString('tr-TR')}+` : `${lo.toLocaleString('tr-TR')}-${hi.toLocaleString('tr-TR')}`,
min: lo,
max: hi,
count: prices.filter(p => p >= lo && (hi === Infinity ? true : p < hi)).length
}))
prices.forEach(price => {
const idx = Math.min(Math.floor((price - min) / bucketSize), bucketCount - 1)
buckets[idx].count++
})
.filter(b => b.count > 0)
return { buckets, mean: Math.round(mean), median: Math.round(median) }
}, [allProducts])
@@ -186,7 +186,7 @@ export default function OverviewTab({
color="blue"
/>
<KpiCard
title="Toplam Satın Alma"
title={overviewKPIs.ordersLabel === 'baskets' ? 'Toplam Sepete Ekleme' : 'Toplam Satın Alma'}
value={overviewKPIs.totalOrders.toLocaleString('tr-TR')}
icon={ShoppingCart}
color="emerald"
@@ -198,7 +198,7 @@ export default function OverviewTab({
color="violet"
/>
<KpiCard
title="Toplam Ciro"
title={overviewKPIs.ordersLabel === 'baskets' ? 'Tahmini Ciro (Sepet)' : 'Toplam Ciro'}
value={`${(overviewKPIs.totalRevenue || 0).toLocaleString('tr-TR')}`}
icon={DollarSign}
color="orange"
@@ -359,10 +359,10 @@ export default function OverviewTab({
contentStyle={{ borderRadius: '8px', border: '1px solid #e2e8f0' }}
/>
<ReferenceLine
x={priceDistribution.buckets.findIndex(b => b.min <= priceDistribution.mean && b.max > priceDistribution.mean)}
x={(priceDistribution.buckets.find(b => b.min <= priceDistribution.mean && (b.max === Infinity || b.max > priceDistribution.mean)) || {}).range}
stroke="#f97316"
strokeDasharray="5 5"
label={{ value: `Ort: ₺${priceDistribution.mean}`, fill: '#f97316', fontSize: 11, position: 'top' }}
label={{ value: `Ort: ₺${priceDistribution.mean.toLocaleString('tr-TR')}`, fill: '#f97316', fontSize: 11, position: 'top' }}
/>
<Bar dataKey="count" fill="#6366f1" radius={[4, 4, 0, 0]} label={{ position: 'top', fill: '#64748b', fontSize: 11 }} />
</BarChart>

View File

@@ -30,7 +30,7 @@ COPY backend/ .
COPY categories/ /data/initial-categories/
# Create data directories with proper permissions
RUN mkdir -p /data/categories /data/reports && \
RUN mkdir -p /data/categories /data/reports /data/logs && \
chmod -R 755 /data
# Make startup script executable (before switching to non-root user)

View File

@@ -0,0 +1,30 @@
"""add path_model to categories
Revision ID: 38207dbbac44
Revises: 001
Create Date: 2026-03-28 14:56:06.784769
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '38207dbbac44'
down_revision: Union[str, None] = '001'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.add_column('categories', sa.Column('path_model', sa.String(), nullable=True))
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_column('categories', 'path_model')
# ### end Alembic commands ###

View File

@@ -17,6 +17,51 @@ class HiddenChampionFinder:
Parçalı pazarlarda (düşük HHI) özelleştirilmiş filtreler kullanır
"""
@staticmethod
def _parse_social_proof_value(value_str: str) -> int:
"""Parse '3k', '248k', '1.2k', '866' gibi değerleri sayıya çevir"""
if not value_str:
return 0
value_str = str(value_str).strip().lower().replace(".", "")
if value_str.endswith("k"):
try:
return int(float(value_str[:-1]) * 1000)
except (ValueError, TypeError):
return 0
if value_str.endswith("m"):
try:
return int(float(value_str[:-1]) * 1000000)
except (ValueError, TypeError):
return 0
try:
return int(value_str)
except (ValueError, TypeError):
return 0
@staticmethod
def _extract_social_proofs(product: Dict) -> Dict[str, int]:
"""Ürünün socialProofs array'inden veri çıkar"""
result = {"page_views": 0, "orders": 0, "baskets": 0, "favorites": 0}
social_proofs = product.get("socialProofs", [])
if not social_proofs:
return result
type_map = {
"pageViewCount": "page_views",
"orderCountL3D": "orders",
"orderCountL365D": "orders",
"basketCount": "baskets",
"favoriteCount": "favorites",
}
for sp in social_proofs:
sp_type = sp.get("type", "")
mapped = type_map.get(sp_type)
if mapped:
val = HiddenChampionFinder._parse_social_proof_value(sp.get("value", "0"))
# Daha büyük değeri al (orderCountL3D vs orderCountL365D)
if val > result[mapped]:
result[mapped] = val
return result
def find(
self,
products: List[Dict],
@@ -98,10 +143,12 @@ class HiddenChampionFinder:
pid = str(product.get("id"))
social = social_details.get(pid, {})
page_views = social.get("page_views", 0) or 0
orders = social.get("orders", 0) or 0
baskets = social.get("baskets", 0) or 0
favorites = social.get("favorites", 0) or 0
# Önce enriched social data, sonra ürünün kendi socialProofs'u
embedded_social = self._extract_social_proofs(product)
page_views = social.get("page_views", 0) or embedded_social["page_views"] or 0
orders = social.get("orders", 0) or embedded_social["orders"] or product.get("orders", 0) or 0
baskets = social.get("baskets", 0) or embedded_social["baskets"] or 0
favorites = social.get("favorites", 0) or embedded_social["favorites"] or 0
conversion_rate = (orders / page_views * 100) if page_views > 0 else 0
@@ -139,15 +186,28 @@ class HiddenChampionFinder:
# Minimum Orders kontrolü (satış verisi çok önemli)
min_orders = filters.get("min_orders", 1) # Varsayılan: en az 1 satış
# Sosyal veri var mı kontrol et
has_social = pid in social_details and page_views > 0
# Özelleştirilmiş Filtreleme (daha esnek)
if has_social:
# Sosyal verisi olan ürünler: tam filtre
passes_filter = (
rating >= filters.get("min_rating", 4.6) and
review_count < filters.get("max_review_count", 30) and
review_count >= 1 and # En az 1 yorum olmalı
orders >= min_orders and # EN AZ 1 SATIŞ OLMALI (satış verisi çok önemli)
(page_views >= threshold_views or page_views >= min_views_threshold) and # Kategori ortalamasının üzerinde VEYA minimum threshold
(baskets >= threshold_baskets or baskets >= min_baskets_threshold) and # Sepet de kategori ortalamasının üzerinde VEYA minimum
(conversion_rate >= 1.0 or page_views >= 500) # Minimum %1 conversion VEYA yüksek görüntülenme
review_count >= 1 and
orders >= min_orders and
(page_views >= threshold_views or page_views >= min_views_threshold) and
(baskets >= threshold_baskets or baskets >= min_baskets_threshold) and
(conversion_rate >= 1.0 or page_views >= 500)
)
else:
# Sosyal verisi olmayan ürünler: sadece rating + review + orders filtresi
passes_filter = (
rating >= filters.get("min_rating", 4.6) and
review_count < filters.get("max_review_count", 30) and
review_count >= 1 and
orders >= min_orders
)
if passes_filter:
@@ -196,7 +256,7 @@ class HiddenChampionFinder:
"category": category_name,
"rating": round(rating, 2),
"review_count": review_count,
"price": product.get("price", {}).get("sellingPrice", 0),
"price": (product.get("price", {}).get("sellingPrice", 0) or product.get("price", {}).get("discountedPrice", 0) or product.get("price", {}).get("current", 0)) if isinstance(product.get("price"), dict) else (product.get("price", 0) or 0),
"page_views": page_views,
"orders": orders,
"baskets": baskets,

View File

@@ -245,7 +245,13 @@ def get_rating_value(product: Dict) -> float:
rating = product.get("rating", 0)
if isinstance(rating, dict):
return rating.get("averageRating", 0) or 0
return float(rating) if rating else 0
if rating:
return float(rating)
# Fallback: ratingScore nested object
rating_score = product.get("ratingScore", {})
if isinstance(rating_score, dict):
return float(rating_score.get("averageRating", 0) or 0)
return 0
def get_review_count(product: Dict) -> int:
@@ -263,6 +269,11 @@ def get_review_count(product: Dict) -> int:
rating = product.get("rating", {})
if isinstance(rating, dict):
review_count = rating.get("totalComments", 0) or rating.get("totalCount", 0) or 0
if not review_count:
# Fallback: ratingScore nested object
rating_score = product.get("ratingScore", {})
if isinstance(rating_score, dict):
review_count = rating_score.get("totalCount", 0) or 0
return int(review_count) if review_count else 0

143
backend/category_seeder.py Normal file
View File

@@ -0,0 +1,143 @@
"""
Category Seeder - Trendyol categories JSON'dan DB'ye aktarma
Kaynak: /Users/furkanyigit/Desktop/trendyol_categories.json
3 seviye hiyerarşi: Segment (Kadın) → Grup (Giyim) → Yaprak (Elbise)
"""
import json
import re
import os
from database import SessionLocal, Category, Snapshot, Report, EnrichmentError
from logging_config import get_logger
log = get_logger("seeder")
DEFAULT_JSON_PATH = os.path.expanduser("~/Desktop/trendyol_categories.json")
def parse_url(url: str) -> dict:
"""URL'den path_model ve trendyol_category_id çıkar.
Örnekler:
/elbise-x-c56 → path_model="elbise-x-c56", category_id=56
/kanvas-canta-y-s20972 → path_model="kanvas-canta-y-s20972", category_id=None
/kadin-giyim-x-g1-c82 → path_model="kadin-giyim-x-g1-c82", category_id=82
"""
# Strip leading slash
path_model = url.lstrip("/")
# Try to extract -c{id} from the end
m = re.search(r"-c(\d+)$", path_model)
category_id = int(m.group(1)) if m else None
return {
"path_model": path_model,
"trendyol_category_id": category_id,
}
def seed_from_json(json_path: str = None, clear_existing: bool = True) -> dict:
"""JSON dosyasını okuyup DB'ye yazar.
Returns:
{"segments": int, "groups": int, "leaves": int, "total": int}
"""
json_path = json_path or DEFAULT_JSON_PATH
with open(json_path, "r", encoding="utf-8") as f:
data = json.load(f)
db = SessionLocal()
try:
if clear_existing:
# FK constraint nedeniyle referans veren tabloları önce temizle
db.query(EnrichmentError).delete(synchronize_session=False)
db.query(Report).delete(synchronize_session=False)
db.query(Snapshot).delete(synchronize_session=False)
db.query(Category).filter(Category.parent_id != None).delete(synchronize_session=False) # noqa: E711
db.query(Category).delete(synchronize_session=False)
db.commit()
log.info("Mevcut kategoriler ve bağlı veriler silindi")
stats = {"segments": 0, "groups": 0, "leaves": 0, "total": 0}
for segment_name, groups in data.items():
# Seviye 1: Segment (Kadın, Erkek, ...)
segment = Category(
name=segment_name,
parent_id=None,
trendyol_category_id=None,
trendyol_url=None,
path_model=None,
is_active=True,
)
db.add(segment)
db.flush() # ID'yi al
stats["segments"] += 1
stats["total"] += 1
for group_item in groups:
group_name = group_item["name"]
group_url = group_item.get("url", "")
group_parsed = parse_url(group_url) if group_url else {"path_model": None, "trendyol_category_id": None}
children = group_item.get("children", [])
if children:
# Seviye 2: Grup (Giyim, Ayakkabı, ...)
group = Category(
name=group_name,
parent_id=segment.id,
trendyol_category_id=group_parsed["trendyol_category_id"],
trendyol_url=f"https://www.trendyol.com{group_url}" if group_url else None,
path_model=group_parsed["path_model"],
is_active=True,
)
db.add(group)
db.flush()
stats["groups"] += 1
stats["total"] += 1
for leaf_item in children:
leaf_url = leaf_item.get("url", "")
leaf_parsed = parse_url(leaf_url) if leaf_url else {"path_model": None, "trendyol_category_id": None}
leaf = Category(
name=leaf_item["name"],
parent_id=group.id,
trendyol_category_id=leaf_parsed["trendyol_category_id"],
trendyol_url=f"https://www.trendyol.com{leaf_url}" if leaf_url else None,
path_model=leaf_parsed["path_model"],
is_active=True,
)
db.add(leaf)
stats["leaves"] += 1
stats["total"] += 1
else:
# Çocuğu yok — bu grup aslında yaprak
leaf = Category(
name=group_name,
parent_id=segment.id,
trendyol_category_id=group_parsed["trendyol_category_id"],
trendyol_url=f"https://www.trendyol.com{group_url}" if group_url else None,
path_model=group_parsed["path_model"],
is_active=True,
)
db.add(leaf)
stats["leaves"] += 1
stats["total"] += 1
db.commit()
log.info(f"Seed tamamlandı: {stats}")
return stats
except Exception as e:
db.rollback()
log.error(f"Seed hatası: {e}")
raise
finally:
db.close()
if __name__ == "__main__":
result = seed_from_json()
print(f"Seed tamamlandı: {result}")

View File

@@ -0,0 +1,791 @@
"""
Data Consolidator — tek birleştirilmiş JSON oluşturma modülü.
Scraping + enrichment bittiğinde tüm normalizasyon ve hesaplamayı yapar,
sonucu reports/report_{id}_data.json olarak kaydeder.
Dashboard endpoint sadece bu dosyayı okur.
"""
import json
import os
import re
import time
import random
from collections import defaultdict
from datetime import datetime
import numpy as np
from logging_config import get_logger
log = get_logger("consolidator")
# ─────────────────────────────────────────────────────────
# Ülke kodu → tam isim mapping (menşei analizi için)
# ─────────────────────────────────────────────────────────
COUNTRY_NAMES = {
"TR": "Türkiye", "CN": "Çin", "US": "Amerika", "GB": "İngiltere",
"FR": "Fransa", "DE": "Almanya", "IT": "İtalya", "ES": "İspanya",
"KR": "Güney Kore", "JP": "Japonya", "IN": "Hindistan", "TW": "Tayvan",
"HK": "Hong Kong", "TH": "Tayland", "VN": "Vietnam", "PL": "Polonya",
"CZ": "Çek Cumhuriyeti", "RO": "Romanya", "BG": "Bulgaristan",
"GR": "Yunanistan", "PT": "Portekiz", "NL": "Hollanda", "BE": "Belçika",
"CH": "İsviçre", "AT": "Avusturya", "SE": "İsveç", "NO": "Norveç",
"DK": "Danimarka", "FI": "Finlandiya", "RU": "Rusya", "UA": "Ukrayna",
"AE": "Birleşik Arap Emirlikleri", "SA": "Suudi Arabistan", "IL": "İsrail",
"EG": "Mısır", "ZA": "Güney Afrika", "BR": "Brezilya", "MX": "Meksika",
"CA": "Kanada", "AU": "Avustralya", "NZ": "Yeni Zelanda", "SG": "Singapur",
"MY": "Malezya", "ID": "Endonezya", "PH": "Filipinler", "PK": "Pakistan",
"BD": "Bangladeş", "AZ": "Azerbaycan",
}
# Barkod prefix → ülke (EAN-13)
BARCODE_COUNTRIES = {
"TYB": "Trendyol (İç Barkod)", "SGT": "Trendyol Satıcı",
"KPE": "Trendyol Kampanya", "RTN": "Trendyol İade", "CDM": "Trendyol Özel",
"00-13": "ABD & Kanada", "190-199": "Rezerve/Özel Kullanım",
"20-29": "Mağaza İçi Kullanım", "30-37": "Fransa",
"380": "Bulgaristan", "383": "Slovenya", "370": "Litvanya",
"372": "Estonya", "373": "Moldova", "375": "Belarus",
"377": "Ermenistan", "379": "Kazakistan", "385": "Hırvatistan",
"387": "Bosna Hersek", "400-440": "Almanya", "45-49": "Japonya",
"50": "İngiltere", "520-521": "Yunanistan", "528": "Lübnan",
"529": "Kıbrıs", "530": "Arnavutluk", "531": "Makedonya",
"535": "Malta", "539": "İrlanda", "54": "Belçika & Lüksemburg",
"560": "Portekiz", "569": "İzlanda", "57": "Danimarka",
"590": "Polonya", "594": "Romanya", "599": "Macaristan",
"600-601": "Güney Afrika", "603": "Gana", "608": "Bahreyn",
"609": "Mauritius", "611": "Fas", "613": "Cezayir",
"615": "Nijerya", "616": "Kenya", "618": "Fildişi Sahili",
"619": "Tunus", "621": "Suriye", "622": "Mısır",
"624": "Libya", "625": "Ürdün", "626": "İran",
"627": "Kuveyt", "628": "Suudi Arabistan", "629": "BAE",
"630": "Katar", "631": "Umman", "64": "Finlandiya",
"690-699": "Çin", "70": "Norveç", "710-719": "Rezerve/Özel Kullanım",
"729": "İsrail", "73": "İsveç", "740": "Guatemala",
"741": "El Salvador", "742": "Honduras", "743": "Nikaragua",
"744": "Kosta Rika", "745": "Panama", "746": "Dominik Cumhuriyeti",
"750": "Meksika", "754-755": "Kanada", "759": "Venezuela",
"76": "İsviçre", "770-771": "Kolombiya", "773": "Uruguay",
"775": "Peru", "777": "Bolivya", "779": "Arjantin",
"780": "Şili", "784": "Paraguay", "786": "Ekvador",
"789-790": "Brezilya", "80-83": "İtalya", "84": "İspanya",
"850": "Küba", "858": "Slovakya", "859": "Çek Cumhuriyeti",
"860": "Sırbistan", "865": "Moğolistan", "867": "Kuzey Kore",
"868-869": "Türkiye", "87": "Hollanda", "880": "Güney Kore",
"884": "Kamboçya", "885": "Tayland", "888": "Singapur",
"890": "Hindistan", "893": "Vietnam", "896": "Pakistan",
"899": "Endonezya", "90-91": "Avusturya", "93": "Avustralya",
"94": "Yeni Zelanda", "955": "Malezya", "958": "Makao",
"977": "Süreli Yayınlar (ISSN)", "978-979": "Kitaplar (ISBN)",
"980": "Para İade Kuponları", "981-984": "Kuponlar", "99": "Kuponlar",
}
# ─────────────────────────────────────────────────────────
# Yardımcı fonksiyonlar
# ─────────────────────────────────────────────────────────
def _extract_price(p):
"""Extract selling price from product, handling both old and Search API formats."""
pr = p.get("price", {})
if isinstance(pr, (int, float)):
return pr
return (pr.get("sellingPrice") or pr.get("discountedPrice")
or pr.get("current") or pr.get("originalPrice")
or pr.get("old") or 0)
def _extract_rating(p):
"""Extract average rating from product."""
rating = p.get("ratingScore") or p.get("rating", 0)
if isinstance(rating, dict):
rating = rating.get("averageRating", 0)
try:
return float(rating) if rating else 0.0
except (ValueError, TypeError):
return 0.0
def _extract_review_count(p):
"""Extract review/comment count from product."""
review_count = 0
try:
review_count = int(p.get("rating_count", 0) or 0)
except (ValueError, TypeError, AttributeError):
pass
if not review_count:
try:
rating_obj = p.get("ratingScore") or p.get("rating", {})
if isinstance(rating_obj, dict):
review_count = int(
rating_obj.get("totalCount", 0)
or rating_obj.get("totalComments", 0)
or 0
)
except (ValueError, TypeError, AttributeError):
review_count = 0
return review_count
def _parse_social_value(value_str):
"""Parse social proof value like '642', '1.2k', '10B+' etc."""
try:
s = str(value_str).strip()
if "k" in s.lower():
return int(float(s.lower().replace("k", "").replace("+", "")) * 1000)
if "b+" in s.lower():
return int(float(s.lower().replace("b+", "")) * 1_000_000_000)
if "m+" in s.lower():
return int(float(s.lower().replace("m+", "")) * 1_000_000)
return int(s.replace("+", ""))
except (ValueError, TypeError):
return 0
def _detect_barcode_country(prefix_num):
"""Detect country from barcode prefix using BARCODE_COUNTRIES mapping."""
for key, country in BARCODE_COUNTRIES.items():
if "-" in key:
start, end = key.split("-")
try:
range_len = len(start)
prefix_to_check = prefix_num[:range_len] if len(prefix_num) >= range_len else prefix_num
prefix_int = int(prefix_to_check) if prefix_to_check.isdigit() else -1
if int(start) <= prefix_int <= int(end):
return country
except ValueError:
continue
elif key == prefix_num[:len(key)]:
return country
return "Bilinmiyor"
# ─────────────────────────────────────────────────────────
# 1. normalize_product
# ─────────────────────────────────────────────────────────
def normalize_product(raw_product, category_name, social_details):
"""
Ham ürünü flat yapıya dönüştür.
Öncelik: inline socialProofs (Top Rankings) > enrichment API (social_details)
"""
product_id = raw_product.get("contentId") or raw_product.get("id")
price = _extract_price(raw_product)
rating = _extract_rating(raw_product)
review_count = _extract_review_count(raw_product)
brand = raw_product.get("brand", {})
brand_name = (brand.get("name") if isinstance(brand, dict) else brand) or "Bilinmeyen"
# ── Social proof: önce inline socialProofs, sonra enrichment ──
orders, page_views, baskets, favorites = 0, 0, 0, 0
# İnline socialProofs (Top Rankings API — ürün dosyasında kayıtlı)
social_proofs = raw_product.get("socialProofs", [])
if isinstance(social_proofs, list):
for proof in social_proofs:
proof_type = proof.get("type", "")
parsed = _parse_social_value(proof.get("value", "0"))
if proof_type == "orderCountL3D":
orders = parsed
elif proof_type == "pageViewCount":
page_views = parsed
elif proof_type == "basketCount":
baskets = parsed
elif proof_type == "favoriteCount":
favorites = parsed
# Enrichment API (social.json) — inline yoksa veya 0 ise fallback
# Key hem str hem int olabilir (dosyadan str, memory'den int)
sp = {}
if product_id and social_details:
sp = (social_details.get(str(product_id))
or social_details.get(int(product_id) if str(product_id).isdigit() else -1)
or {})
if not orders:
orders = sp.get("orders", 0) or 0
if not page_views:
page_views = sp.get("page_views", 0) or 0
if not baskets:
baskets = sp.get("baskets", 0) or 0
if not favorites:
favorites = sp.get("favorites", 0) or 0
# ── Image URL ──
image_url = raw_product.get("imageUrl", "")
if not image_url:
images = raw_product.get("images", [])
image_url = images[0] if isinstance(images, list) and images else ""
# ── Product URL ──
product_url = raw_product.get("url", "")
if not product_url and product_id:
product_url = f"https://www.trendyol.com/p/{product_id}"
# ── Barcode ──
barcode = ""
winner_variant = raw_product.get("winnerVariant", {})
if isinstance(winner_variant, dict):
barcode = winner_variant.get("barcode", "")
# ── Country (origin) ──
country_code = ""
country_name = "Bilinmeyen"
merchant_listings = raw_product.get("merchantListings", [])
if merchant_listings:
custom_values = merchant_listings[0].get("customValues", [])
for cv in custom_values:
if cv.get("key") == "origin":
country_code = cv.get("value", "").upper()
country_name = COUNTRY_NAMES.get(
country_code, f"Diğer ({country_code})" if country_code else "Bilinmeyen"
)
break
return {
"id": product_id,
"name": raw_product.get("name", ""),
"brand": brand_name,
"category": category_name,
"category_name": category_name, # Frontend uyumluluğu (ProductFinderTab, OpportunityTab)
"price": round(price, 2) if price else 0,
"rating": round(rating, 2),
"review_count": review_count,
"orders": orders,
"page_views": page_views,
"baskets": baskets,
"favorites": favorites,
"barcode": barcode,
"country_code": country_code,
"country": country_name,
"image_url": image_url or "https://via.placeholder.com/150",
"url": product_url,
"in_stock": raw_product.get("inStock", False),
}
# ─────────────────────────────────────────────────────────
# 2. calculate_kpis
# ─────────────────────────────────────────────────────────
def calculate_kpis(products):
"""KPI hesaplaması (main.py 2182-2262 mantığı)."""
total_products = len(products)
prices = [p["price"] for p in products if p["price"] > 0]
ratings = [p["rating"] for p in products if p["rating"] > 0]
avg_price = sum(prices) / len(prices) if prices else 0
median_price = float(np.percentile(prices, 50)) if prices else 0
min_price = min(prices) if prices else 0
max_price = max(prices) if prices else 0
avg_rating = sum(ratings) / len(ratings) if ratings else 0
low_rating_count = sum(1 for r in ratings if r < 3.0)
low_rating_rate = (low_rating_count / len(ratings) * 100) if ratings else 0
unique_brands = set(p["brand"] for p in products if p["brand"] and p["brand"] != "Bilinmeyen")
unique_subcategories = set(p["category"] for p in products if p["category"])
return {
"total_products": total_products,
"total_subcategories": len(unique_subcategories),
"total_brands": len(unique_brands),
"avg_price": round(avg_price, 2),
"median_price": round(median_price, 2),
"avg_rating": round(avg_rating, 2),
"low_rating_count": low_rating_count,
"low_rating_rate": round(low_rating_rate, 2),
"min_price": round(min_price, 2),
"max_price": round(max_price, 2),
}
# ─────────────────────────────────────────────────────────
# 3. calculate_charts
# ─────────────────────────────────────────────────────────
def calculate_charts(products):
"""Grafik verisi hesaplaması (main.py 2264-3248 mantığı)."""
prices = [p["price"] for p in products if p["price"] > 0]
total_products = len(products)
# ── Price distribution ──
price_ranges = {"0-100": 0, "100-250": 0, "250-500": 0, "500-1000": 0, "1000+": 0}
for price in prices:
if price < 100:
price_ranges["0-100"] += 1
elif price < 250:
price_ranges["100-250"] += 1
elif price < 500:
price_ranges["250-500"] += 1
elif price < 1000:
price_ranges["500-1000"] += 1
else:
price_ranges["1000+"] += 1
# ── Kategori ve marka grupları ──
categories_data = defaultdict(list)
brands_data = defaultdict(int)
for p in products:
categories_data[p["category"]].append(p)
brands_data[p["brand"]] += 1
# ── Top categories (satışa göre sıralı) ──
top_categories = []
for cat_name, cat_products in categories_data.items():
total_orders = sum(p["orders"] for p in cat_products)
top_categories.append({
"name": cat_name,
"count": len(cat_products),
"total_orders": total_orders,
})
top_categories = sorted(top_categories, key=lambda x: x["total_orders"], reverse=True)[:20]
# ── Top brands ──
top_brands = sorted(
[{"name": brand, "count": count} for brand, count in brands_data.items()],
key=lambda x: x["count"], reverse=True,
)[:20]
# ── Rating distribution ──
rating_distribution = {"0-1": 0, "1-2": 0, "2-3": 0, "3-4": 0, "4-5": 0}
for p in products:
r = p["rating"]
if r < 1:
rating_distribution["0-1"] += 1
elif r < 2:
rating_distribution["1-2"] += 1
elif r < 3:
rating_distribution["2-3"] += 1
elif r < 4:
rating_distribution["3-4"] += 1
else:
rating_distribution["4-5"] += 1
# ── Brand price boxplot (top 10) ──
brand_price_stats = []
for brand_name in [b["name"] for b in top_brands[:10]]:
bp = [p["price"] for p in products if p["brand"] == brand_name and p["price"] > 0]
if bp and len(bp) >= 4:
pcts = np.percentile(bp, [0, 25, 50, 75, 100])
brand_price_stats.append({
"brand": brand_name,
"min": round(float(pcts[0]), 2),
"q1": round(float(pcts[1]), 2),
"median": round(float(pcts[2]), 2),
"q3": round(float(pcts[3]), 2),
"max": round(float(pcts[4]), 2),
"count": len(bp),
})
# ── Scatter plot (price vs rating) — sample 500 ──
scatter_data = []
sample_size = min(500, len(products))
sampled = random.sample(products, sample_size) if products else []
for p in sampled:
if p["price"] > 0 and p["rating"] > 0:
scatter_data.append({
"price": p["price"],
"rating": p["rating"],
"brand": p["brand"],
"in_stock": p["in_stock"],
})
# ── Brand strength score ──
brand_strength_scores = []
for brand_name in [b["name"] for b in top_brands[:10]]:
bp = [p for p in products if p["brand"] == brand_name]
brand_count = len(bp)
brand_share = (brand_count / total_products * 100) if total_products > 0 else 0
brand_ratings = [p["rating"] for p in bp if p["rating"] > 0]
brand_avg_rating = sum(brand_ratings) / len(brand_ratings) if brand_ratings else 0
brand_out_of_stock = sum(1 for p in bp if not p["in_stock"])
stockout_rate = (brand_out_of_stock / brand_count * 100) if brand_count > 0 else 0
strength = brand_share + (brand_avg_rating * 5) - stockout_rate
brand_strength_scores.append({
"brand": brand_name,
"share": round(brand_share, 2),
"avg_rating": round(brand_avg_rating, 2),
"stockout_rate": round(stockout_rate, 2),
"strength_score": round(strength, 2),
})
brand_strength_scores.sort(key=lambda x: x["strength_score"], reverse=True)
# ── Heatmap: Brand × Category ──
top_10_brands = [b["name"] for b in top_brands[:10]]
top_10_cats = [c["name"] for c in top_categories[:10]]
heatmap_data = []
for cat_name in top_10_cats:
cat_products = categories_data.get(cat_name, [])
for brand_name in top_10_brands:
count = sum(1 for p in cat_products if p["brand"] == brand_name)
if count > 0:
heatmap_data.append({"brand": brand_name, "category": cat_name, "value": count})
# ── Category price premium ──
avg_price = sum(prices) / len(prices) if prices else 0
category_price_analysis = []
for cat_name, cat_products in categories_data.items():
cp = [p["price"] for p in cat_products if p["price"] > 0]
if cp:
cat_avg = sum(cp) / len(cp)
cat_median = float(np.percentile(cp, 50))
premium = ((cat_avg - avg_price) / avg_price * 100) if avg_price > 0 else 0
category_price_analysis.append({
"category": cat_name,
"avg_price": round(cat_avg, 2),
"median_price": round(cat_median, 2),
"price_premium": round(premium, 2),
"product_count": len(cp),
"min_price": round(min(cp), 2),
"max_price": round(max(cp), 2),
})
category_price_analysis.sort(key=lambda x: x["price_premium"], reverse=True)
most_expensive = [c for c in category_price_analysis if c["price_premium"] > 0][:10]
most_affordable = [c for c in category_price_analysis if c["price_premium"] < 0][-10:]
most_affordable.reverse()
# ── Origin analysis ──
origin_counts = defaultdict(int)
products_with_origin = 0
for p in products:
if p["country_code"]:
origin_counts[p["country_code"]] += 1
products_with_origin += 1
origin_country_data = sorted(
[
{
"country_code": code,
"country_name": COUNTRY_NAMES.get(code, f"Diğer ({code})"),
"product_count": count,
"percentage": round(count / products_with_origin * 100, 2) if products_with_origin else 0,
}
for code, count in origin_counts.items()
],
key=lambda x: x["product_count"], reverse=True,
)
# ── Barcode analysis ──
barcode_prefixes = defaultdict(int)
barcode_countries_detected = defaultdict(int)
products_with_barcode = 0
for p in products:
bc = p.get("barcode", "")
if bc and len(bc) >= 3:
products_with_barcode += 1
prefix = bc[:3]
barcode_prefixes[prefix] += 1
detected = _detect_barcode_country(prefix)
barcode_countries_detected[detected] += 1
barcode_prefix_data = sorted(
[
{
"prefix": prefix,
"detected_country": _detect_barcode_country(prefix),
"product_count": count,
"percentage": round(count / products_with_barcode * 100, 2) if products_with_barcode else 0,
}
for prefix, count in barcode_prefixes.items()
],
key=lambda x: x["product_count"], reverse=True,
)[:20]
barcode_country_data = sorted(
[
{
"country_name": country,
"product_count": count,
"percentage": round(count / products_with_barcode * 100, 2) if products_with_barcode else 0,
}
for country, count in barcode_countries_detected.items()
],
key=lambda x: x["product_count"], reverse=True,
)
# ── Merchant analysis ──
merchants_data = {}
total_winners = 0
products_with_merchant = 0
# We need raw product data for merchant analysis — use the flat products
# Merchant info is already lost in normalization, so we skip this in consolidator
# The original code extracted from raw_product.merchantListings
# For consolidated data, we'll build merchants from the products we have
# ── Build result ──
return {
"price_distribution": price_ranges,
"top_categories": top_categories,
"top_brands": top_brands,
"rating_distribution": rating_distribution,
"brand_price_boxplot": brand_price_stats,
"price_rating_scatter": scatter_data,
"brand_strength": brand_strength_scores,
"brand_category_heatmap": heatmap_data,
"category_price_premium": {
"all_categories": category_price_analysis,
"most_expensive": most_expensive,
"most_affordable": most_affordable,
},
"origin_analysis": {
"countries": origin_country_data,
"top_countries": origin_country_data[:10],
"total_products_with_origin": products_with_origin,
"coverage_percentage": round(products_with_origin / total_products * 100, 2) if total_products else 0,
},
"barcode_analysis": {
"prefixes": barcode_prefix_data,
"countries_from_barcode": barcode_country_data,
"top_countries_from_barcode": barcode_country_data[:10],
"total_products_with_barcode": products_with_barcode,
"coverage_percentage": round(products_with_barcode / total_products * 100, 2) if total_products else 0,
},
}
def _calculate_merchant_analysis(raw_products, categories_data):
"""
Satıcı analizini ham ürün verisinden hesapla (merchantListings alanı gerekli).
raw_products: ham Trendyol ürün dict listesi, categories_data: {cat_name: [products]}
"""
merchants_data = {}
total_winners = 0
products_with_merchant = 0
for product in raw_products:
merchant_listings = product.get("merchantListings", [])
if not merchant_listings:
continue
ml = merchant_listings[0]
merchant = ml.get("merchant", {})
merchant_id = merchant.get("id")
if not merchant_id:
continue
products_with_merchant += 1
if merchant_id not in merchants_data:
merchant_name = merchant.get("name") or merchant.get("officialName") or f"Satıcı {merchant_id}"
merchants_data[merchant_id] = {
"merchant_id": merchant_id,
"merchant_name": merchant_name,
"product_count": 0,
"total_price": 0,
"winner_count": 0,
}
merchants_data[merchant_id]["product_count"] += 1
price = _extract_price(product)
if price > 0:
merchants_data[merchant_id]["total_price"] += price
if ml.get("isWinner"):
merchants_data[merchant_id]["winner_count"] += 1
total_winners += 1
merchant_list = []
for mid, data in merchants_data.items():
avg_price = data["total_price"] / data["product_count"] if data["product_count"] > 0 else 0
winner_ratio = (data["winner_count"] / data["product_count"] * 100) if data["product_count"] > 0 else 0
merchant_url = None
if data["merchant_name"] and not data["merchant_name"].startswith("Satıcı "):
merchant_url = f"https://www.trendyol.com/magaza/{data['merchant_name'].lower().replace(' ', '-')}-m-{mid}"
merchant_list.append({
"merchant_id": mid,
"merchant_name": data["merchant_name"],
"merchant_url": merchant_url,
"product_count": data["product_count"],
"avg_price": round(avg_price, 2),
"winner_count": data["winner_count"],
"winner_ratio": round(winner_ratio, 2),
})
merchant_list.sort(key=lambda x: x["product_count"], reverse=True)
total_products = len(raw_products)
total_merchants = len(merchants_data)
winner_percentage = (total_winners / products_with_merchant * 100) if products_with_merchant > 0 else 0
return {
"merchants": merchant_list,
"top_merchants": merchant_list[:20],
"total_merchants": total_merchants,
"total_products_with_merchant": products_with_merchant,
"total_winners": total_winners,
"winner_percentage": round(winner_percentage, 2),
"coverage_percentage": round(products_with_merchant / total_products * 100, 2) if total_products else 0,
}
# ─────────────────────────────────────────────────────────
# 4. calculate_insights
# ─────────────────────────────────────────────────────────
def calculate_insights(products):
"""Low-rating ürünler ve fiyat anomalileri."""
# ── Low rating products ──
low_rating = []
for p in products:
if 0 < p["rating"] < 3.0:
low_rating.append({
"name": p["name"][:50],
"brand": p["brand"],
"rating": p["rating"],
"price": p["price"],
"in_stock": p["in_stock"],
})
low_rating = sorted(low_rating, key=lambda x: x["rating"])[:20]
# ── Anomalies (IQR) ──
prices = [p["price"] for p in products if p["price"] > 0]
anomalies = []
if len(prices) > 4:
q1, q3 = np.percentile(prices, [25, 75])
iqr = q3 - q1
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr
for p in products:
if p["price"] > 0 and (p["price"] < lower or p["price"] > upper):
anomalies.append({
"name": p["name"][:50],
"brand": p["brand"],
"price": p["price"],
"type": "expensive" if p["price"] > upper else "cheap",
})
anomalies = sorted(anomalies, key=lambda x: x["price"], reverse=True)[:20]
return {"low_rating_products": low_rating, "anomalies": anomalies}
# ─────────────────────────────────────────────────────────
# 5. build_consolidated_report (ana orkestratör)
# ─────────────────────────────────────────────────────────
def build_consolidated_report(report_id, db, reports_dir, social_data=None):
"""
Rapor verisini yükle → normalize et → hesapla → döndür.
Args:
report_id: DB rapor ID
db: SQLAlchemy session
reports_dir: reports/ klasör yolu
social_data: Enrichment social.json verisi (opsiyonel, yoksa dosyadan okunur)
Returns:
Konsolide dashboard dict
"""
from database import Report
t0 = time.time()
report = db.query(Report).filter(Report.id == report_id).first()
if not report:
return None
if not report.json_file_path or not os.path.exists(report.json_file_path):
return None
# Rapor meta verisini oku
with open(report.json_file_path, "r", encoding="utf-8") as f:
report_data = json.load(f)
# Social proof verisini yükle
social_details = {}
if social_data:
social_details = social_data.get("details", {})
else:
social_file = os.path.join(reports_dir, f"enrich_{report_id}", "social.json")
if os.path.exists(social_file):
try:
with open(social_file, "r", encoding="utf-8") as f:
soc = json.load(f)
social_details = soc.get("details", {})
except Exception as e:
log.warning(f"Social proof dosyası okunamadı: {e}")
# ── Ham ürünleri yükle ve normalize et ──
normalized_products = []
raw_products_all = [] # Merchant analizi için ham verileri tut
for detail in report_data.get("details", []):
if not detail.get("success") or not detail.get("file_path"):
continue
file_path = detail["file_path"]
if not os.path.exists(file_path):
continue
try:
with open(file_path, "r", encoding="utf-8") as f:
cat_data = json.load(f)
raw_products = cat_data.get("products", [])
cat_name_raw = detail.get("category_name", "")
cat_name = re.sub(r'\s+\d+$', '', cat_name_raw)
for raw in raw_products:
# Set category on raw product for load_report_products compatibility
if isinstance(raw.get("category"), dict):
raw["category"]["name"] = cat_name
else:
raw["category"] = {"id": 0, "name": cat_name}
norm = normalize_product(raw, cat_name, social_details)
if norm["price"] and norm["category"]:
normalized_products.append(norm)
raw_products_all.extend(raw_products)
except (json.JSONDecodeError, OSError, KeyError) as e:
log.warning(f"Kategori dosyası okunamadı: {file_path}: {e}")
continue
if not normalized_products:
log.warning(f"Rapor {report_id} için ürün bulunamadı")
return None
# ── Hesaplamalar ──
kpis = calculate_kpis(normalized_products)
charts = calculate_charts(normalized_products)
insights = calculate_insights(normalized_products)
# Merchant analysis (ham veri gerekli)
charts["merchant_analysis"] = _calculate_merchant_analysis(raw_products_all, {})
elapsed = time.time() - t0
log.info(f"Rapor {report_id} konsolide edildi: {len(normalized_products)} ürün, {elapsed:.2f}s")
return {
"metadata": {
"report_id": report_id,
"report_name": report.name,
"created_at": report.created_at.isoformat() if report.created_at else None,
"total_products": len(normalized_products),
"total_categories": kpis["total_subcategories"],
"consolidated_at": datetime.now().isoformat(),
},
"report_id": report_id,
"report_name": report.name,
"products": normalized_products,
"all_products": normalized_products, # Geriye uyumluluk (frontend "all_products" bekliyor)
"kpis": kpis,
"charts": charts,
"insights": insights,
}
# ─────────────────────────────────────────────────────────
# 6. save / load
# ─────────────────────────────────────────────────────────
def save_consolidated_report(report_id, data, reports_dir):
"""Konsolide veriyi reports/report_{id}_data.json olarak kaydet."""
path = os.path.join(reports_dir, f"report_{report_id}_data.json")
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False)
log.info(f"Konsolide rapor kaydedildi: {path}")
return path
def load_consolidated_report(report_id, reports_dir):
"""Konsolide dosya varsa oku, yoksa None döndür."""
path = os.path.join(reports_dir, f"report_{report_id}_data.json")
if os.path.exists(path):
try:
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
except (json.JSONDecodeError, OSError) as e:
log.warning(f"Konsolide dosya okunamadı: {path}: {e}")
return None

View File

@@ -6,6 +6,9 @@ from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship
from datetime import datetime
import os
from logging_config import get_logger
log = get_logger("db")
# PostgreSQL database - configurable via environment variable
# Default: Local PostgreSQL for development
@@ -26,6 +29,7 @@ class Category(Base):
parent_id = Column(Integer, ForeignKey('categories.id'), nullable=True)
trendyol_category_id = Column(Integer, nullable=True)
trendyol_url = Column(String, nullable=True)
path_model = Column(String, nullable=True) # URL slug for search API (e.g. "elbise-x-c56")
is_active = Column(Boolean, default=True)
created_at = Column(DateTime, default=datetime.utcnow)
@@ -86,7 +90,7 @@ class EnrichmentError(Base):
def init_db():
"""Initialize database - create tables"""
Base.metadata.create_all(bind=engine)
print("Database initialized successfully!")
log.info("Database initialized successfully")
def get_db():

View File

@@ -8,6 +8,9 @@ from pytrends.request import TrendReq
from typing import Dict, Optional
from datetime import datetime, timedelta
import time
from logging_config import get_logger
log = get_logger("trends")
class GoogleTrendsCache:
@@ -135,12 +138,12 @@ def fetch_google_trends(product_name: str, retries: int = 3) -> Dict:
except Exception as e:
error_msg = str(e)
print(f"Google Trends API Error (attempt {attempt + 1}/{retries}): {error_msg}")
log.warning(f"Google Trends API Error (attempt {attempt + 1}/{retries}): {error_msg}")
# Rate limit error - wait longer
if '429' in error_msg or 'rate' in error_msg.lower():
wait_time = 5 * (attempt + 1) # 5, 10, 15 seconds
print(f"Rate limited. Waiting {wait_time} seconds...")
log.warning(f"Rate limited. Waiting {wait_time} seconds...")
time.sleep(wait_time)
continue

197
backend/logging_config.py Normal file
View File

@@ -0,0 +1,197 @@
"""
Structured Logging Configuration for Trendyol Product Dashboard
Provides:
- JSON structured logs to file (for machine parsing)
- Colored console logs (for human reading)
- Correlation ID tracking per request/report
- Rotating file handlers with size limits
- Timing context manager for operation profiling
"""
import logging
import logging.handlers
import json
import os
import time
from contextvars import ContextVar
from contextlib import contextmanager
from datetime import datetime, timezone
# ---------------------------------------------------------------------------
# Context variables for log correlation
# ---------------------------------------------------------------------------
_correlation_id: ContextVar[str] = ContextVar("correlation_id", default="-")
_report_id: ContextVar[str] = ContextVar("report_id", default="-")
def set_correlation_id(cid: str):
_correlation_id.set(cid)
def get_correlation_id() -> str:
return _correlation_id.get()
def set_report_id(rid):
_report_id.set(str(rid) if rid is not None else "-")
def get_report_id() -> str:
return _report_id.get()
# ---------------------------------------------------------------------------
# JSON Formatter (file output)
# ---------------------------------------------------------------------------
class JSONFormatter(logging.Formatter):
"""Structured JSON log formatter for file output."""
def format(self, record: logging.LogRecord) -> str:
log_entry = {
"ts": datetime.now(timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"msg": record.getMessage(),
"correlation_id": get_correlation_id(),
"report_id": get_report_id(),
}
# Add extra fields if present
for key in ("url", "status_code", "response_time_ms", "response_size",
"error_type", "duration_ms", "cb_state", "failures",
"batch_size", "product_count", "cache_size"):
val = getattr(record, key, None)
if val is not None:
log_entry[key] = val
# Add exception info
if record.exc_info and record.exc_info[0] is not None:
log_entry["exception"] = self.formatException(record.exc_info)
return json.dumps(log_entry, ensure_ascii=False, default=str)
# ---------------------------------------------------------------------------
# Console Formatter (colored, human-readable)
# ---------------------------------------------------------------------------
_LEVEL_COLORS = {
"DEBUG": "\033[36m", # cyan
"INFO": "\033[32m", # green
"WARNING": "\033[33m", # yellow
"ERROR": "\033[31m", # red
"CRITICAL": "\033[1;31m", # bold red
}
_RESET = "\033[0m"
class ConsoleFormatter(logging.Formatter):
"""Colored, human-readable console formatter."""
def format(self, record: logging.LogRecord) -> str:
color = _LEVEL_COLORS.get(record.levelname, "")
ts = datetime.now().strftime("%H:%M:%S")
level = record.levelname[0] # D, I, W, E, C
report = get_report_id()
report_tag = f" [r:{report}]" if report != "-" else ""
msg = record.getMessage()
base = f"{color}{ts} [{level}]{report_tag} {msg}{_RESET}"
if record.exc_info and record.exc_info[0] is not None:
base += "\n" + self.formatException(record.exc_info)
return base
# ---------------------------------------------------------------------------
# Setup function
# ---------------------------------------------------------------------------
def setup_logging(log_dir: str = None):
"""
Configure the entire logging system. Call once at startup.
Creates:
- logs/trendyol.log (all levels, JSON, 10MB x 5 rotation)
- logs/errors.log (WARNING+, JSON, 10MB x 3 rotation)
- console output (INFO+, colored)
"""
if log_dir is None:
log_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "logs")
os.makedirs(log_dir, exist_ok=True)
root = logging.getLogger("trendyol")
root.setLevel(logging.DEBUG)
# Prevent duplicate handlers on reload
if root.handlers:
return
json_fmt = JSONFormatter()
console_fmt = ConsoleFormatter()
# 1. Main log file — all levels, JSON
main_handler = logging.handlers.RotatingFileHandler(
os.path.join(log_dir, "trendyol.log"),
maxBytes=10 * 1024 * 1024, # 10 MB
backupCount=5,
encoding="utf-8",
)
main_handler.setLevel(logging.DEBUG)
main_handler.setFormatter(json_fmt)
root.addHandler(main_handler)
# 2. Error log file — WARNING+, JSON
error_handler = logging.handlers.RotatingFileHandler(
os.path.join(log_dir, "errors.log"),
maxBytes=10 * 1024 * 1024,
backupCount=3,
encoding="utf-8",
)
error_handler.setLevel(logging.WARNING)
error_handler.setFormatter(json_fmt)
root.addHandler(error_handler)
# 3. Console — INFO+, colored
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(console_fmt)
root.addHandler(console_handler)
# Quiet noisy libraries
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("sqlalchemy").setLevel(logging.WARNING)
logging.getLogger("sqlalchemy.engine").setLevel(logging.WARNING)
# ---------------------------------------------------------------------------
# Logger factory
# ---------------------------------------------------------------------------
def get_logger(name: str) -> logging.Logger:
"""Get a namespaced logger: trendyol.<name>"""
return logging.getLogger(f"trendyol.{name}")
# ---------------------------------------------------------------------------
# Timing context manager
# ---------------------------------------------------------------------------
@contextmanager
def log_timing(logger: logging.Logger, operation: str, level=logging.INFO, **extra):
"""Context manager that logs operation duration."""
start = time.monotonic()
try:
yield
finally:
elapsed_ms = round((time.monotonic() - start) * 1000, 1)
logger.log(
level,
f"{operation} completed in {elapsed_ms}ms",
extra={"duration_ms": elapsed_ms, **extra},
)

File diff suppressed because it is too large Load Diff

View File

@@ -10,6 +10,9 @@ import math
import os
from typing import Dict, List, Any, Optional
from datetime import datetime
from logging_config import get_logger
log = get_logger("scraper")
class TrendyolScraper:
@@ -55,7 +58,7 @@ class TrendyolScraper:
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Sayfa {page} error: {e}")
log.warning(f"Sayfa {page} error: {e}")
return None
def get_total_count(self) -> int:
@@ -96,7 +99,7 @@ class TrendyolScraper:
# Sayfa sayısını hesapla
total_pages = self.calculate_total_pages(total_count, max_pages)
print(f"📦 Kategori {self.category_id}: {total_count} ürün, {total_pages} sayfa çekilecek")
log.info(f"Kategori {self.category_id}: {total_count} ürün, {total_pages} sayfa çekilecek")
# Sayfaları çek
all_products = []
@@ -105,7 +108,7 @@ class TrendyolScraper:
data = self.fetch_page(page)
if not data or not data.get('isSuccess'):
print(f"⚠️ Sayfa {page} atlandı")
log.warning(f"Sayfa {page} atlandı")
continue
products = data.get('products', [])
@@ -144,7 +147,7 @@ class TrendyolScraper:
return True
except Exception as e:
print(f"Dosya kaydetme hatası: {e}")
log.error(f"Dosya kaydetme hatası: {e}")
return False
def get_category_info(self) -> Optional[Dict[str, Any]]:
@@ -157,6 +160,112 @@ class TrendyolScraper:
return data.get('categoryInfo', {})
class TrendyolSearchScraper:
"""Trendyol Search API ile ürün çeker — tüm kategori tipleri için çalışır (-c ve -s)"""
API_BASE_URL = "https://apigw.trendyol.com/discovery-sfint-search-service/api/search/products"
def __init__(self, path_model: str, page_size: int = 24):
self.path_model = path_model
self.page_size = page_size
self.headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
"Accept": "application/json",
"Referer": f"https://www.trendyol.com/{path_model}",
"Origin": "https://www.trendyol.com"
}
self.cookies = {
"storefrontId": "1",
"language": "tr",
"countryCode": "TR"
}
def fetch_page(self, page: int) -> Optional[Dict[str, Any]]:
"""Tek sayfa çeker"""
params = {
"pathModel": self.path_model,
"pi": page,
"ps": self.page_size,
"channelId": 1,
"storefrontId": 1,
"culture": "tr-TR"
}
try:
response = requests.get(
self.API_BASE_URL,
params=params,
headers=self.headers,
cookies=self.cookies,
timeout=15
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
log.warning(f"Search API sayfa {page} error ({self.path_model}): {e}")
return None
def fetch_all_products(self, delay: float = 1.0, max_pages: int = 10) -> List[Dict[str, Any]]:
"""Tüm ürünleri çeker, normalize eder (max_pages=10 x page_size=24 = 240 ürün)"""
first = self.fetch_page(1)
if not first:
return []
total = first.get("total", 0) or first.get("totalCount", 0) or first.get("roughTotal", 0)
raw_products = first.get("products", [])
if total == 0 and not raw_products:
return []
# total 0 olsa bile ürün varsa en az 1 sayfa çek
if total == 0 and raw_products:
total = len(raw_products)
total_pages = min(math.ceil(total / self.page_size), max_pages)
log.info(f"Search API {self.path_model}: {total} ürün, {total_pages} sayfa çekilecek")
for page in range(2, total_pages + 1):
data = self.fetch_page(page)
if data and data.get("products"):
raw_products.extend(data["products"])
if page < total_pages:
time.sleep(delay)
return [_normalize_search_product(p) for p in raw_products]
def _normalize_search_product(raw: dict) -> dict:
"""Search API ürün formatını mevcut sisteme uyumlu hale getir"""
brand = raw.get("brand", {})
if isinstance(brand, str):
brand = {"name": brand}
price = raw.get("price", {})
if isinstance(price, (int, float)):
price = {"sellingPrice": price, "originalPrice": price}
elif isinstance(price, dict) and "sellingPrice" not in price:
# Search API returns current/discountedPrice/originalPrice — map to sellingPrice
price["sellingPrice"] = price.get("discountedPrice") or price.get("current") or price.get("originalPrice") or price.get("old") or 0
rating = raw.get("ratingScore", {})
if rating is None:
rating = {}
return {
"id": raw.get("id") or raw.get("contentId"),
"name": raw.get("name", ""),
"brand": brand,
"price": price,
"ratingScore": rating,
"url": raw.get("url", ""),
"imageUrl": raw.get("image", raw.get("imageUrl", "")),
"merchantListings": raw.get("merchantListings", []),
"winnerVariant": raw.get("winnerVariant", {}),
"socialProofs": raw.get("socialProofs", []),
"categoryId": raw.get("categoryId"),
"categoryName": raw.get("categoryName"),
}
def scrape_category(category_id: int, category_name: str, output_dir: str = "../categories") -> Dict[str, Any]:
"""
Tek bir kategoriyi çeker
@@ -227,9 +336,7 @@ def scrape_multiple_categories(categories: List[tuple], delay: float = 2.0) -> D
}
for i, (cat_id, cat_name) in enumerate(categories, 1):
print(f"\n{'='*80}")
print(f"📂 [{i}/{len(categories)}] {cat_name} (ID: {cat_id})")
print('='*80)
log.info(f"[{i}/{len(categories)}] {cat_name} (ID: {cat_id})")
result = scrape_category(cat_id, cat_name)
results["details"].append(result)
@@ -237,10 +344,10 @@ def scrape_multiple_categories(categories: List[tuple], delay: float = 2.0) -> D
if result["success"]:
results["successful"] += 1
results["total_products"] += result["total_products"]
print(f"Başarılı: {result['total_products']} ürün")
log.info(f"Başarılı: {result['total_products']} ürün")
else:
results["failed"] += 1
print(f"Hata: {result['error']}")
log.error(f"Hata: {result['error']}")
# Kategoriler arası bekleme
if i < len(categories):