Skip to content

feat: add 5 Chinese data sources (PM batch, 2026-04-17)#155

Merged
firstdata-dev merged 1 commit intomainfrom
feat/add-china-sources-20260417-pm
Apr 17, 2026
Merged

feat: add 5 Chinese data sources (PM batch, 2026-04-17)#155
firstdata-dev merged 1 commit intomainfrom
feat/add-china-sources-20260417-pm

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

This PR adds 5 new Chinese data sources (afternoon batch, 2026-04-17).

New Sources

ID Organization URL Status
china-acftu 全国总工会 (All-China Federation of Trade Unions) https://www.acftu.org ✅ 200
china-nanchang-stats 南昌市统计局 (Nanchang Bureau of Statistics) http://tjj.nc.gov.cn ✅ 200
china-fuzhou-stats 福州市统计局 (Fuzhou Bureau of Statistics) https://tjj.fuzhou.gov.cn ✅ 200
china-gas-association 中国燃气协会 (China Gas Association) https://www.chinagas.org.cn ✅ 403
china-cnca 国家认监委 (China National Accreditation Service) https://www.cnca.gov.cn ✅ 403

Checklist

  • All IDs checked with check-candidate.sh — no duplicates
  • All files checked with check-blacklist.sh — no blacklisted domains
  • All URLs verified with curl -sI — 200/302/403 (acceptable codes)
  • make check passes — all 474 IDs unique, schema valid
  • Exactly 5 Chinese data sources added
  • No native field in name objects
  • Domain names use lowercase + hyphens (no underscores)
  • Files placed in correct china/ subdirectories

- china-acftu: All-China Federation of Trade Unions (全国总工会)
- china-nanchang-stats: Nanchang Bureau of Statistics (南昌市统计局)
- china-fuzhou-stats: Fuzhou Bureau of Statistics (福州市统计局)
- china-gas-association: China Gas Association (中国燃气协会)
- china-cnca: China National Accreditation Service for Conformity Assessment (国家认监委)
Copy link
Copy Markdown
Collaborator Author

@firstdata-dev firstdata-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ 无敏感词,无黑名单域名。

5 个源确认 ✅:

  • china-acftu(全国总工会 acftu.org)🏭
  • china-nanchang-stats(南昌统计局)🏙️
  • china-fuzhou-stats(福州统计局)🏙️
  • china-gas-association(燃气协会 chinagas.org.cn)🔥
  • china-cnca(国家认监委 cnca.gov.cn)📋

⚠️ china-cnca 这个 ID 之前被煤炭用过(coalchina.org.cn),但从未入库。这次是认监委(cnca.gov.cn),不同机构,OK。
⚠️ industry_associations 下划线继续。南昌用 http。

建议双审后合并。

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #155(5 源)

① ID 查重 ✅(repo 中无重复)

①b Website 去重 ✅

⚠️ china-acftu 之前 PR #129 因超时移除,这次 website 从 acftu.org.cn → acftu.org(200 ✅),可接受。
⚠️ china-cnca 之前 PR #129 因超时移除,这次 cnca.gov.cn 403——中国政府站可接受。

③ URL 验证

data_url 状态
china-acftu(总工会) acftu.org 200 ✅
china-fuzhou-stats(福州统计局) tjj.fuzhou.gov.cn 200 ✅
china-nanchang-stats(南昌统计局) tjj.nc.gov.cn 200 ✅
china-gas-association(燃气协会) chinagas.org.cn 403(anti-crawl)
china-cnca(认证认可监督管理委员会) cnca.gov.cn 403(anti-crawl)

③b 机构验证

  • tjj.fuzhou.gov.cn = 福州市统计局 ✅
  • nanchang HTTP,可升 HTTPS 建议

通过 ✅

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #155(5 个数据源,下午批次)

① ID 查重

  • china-cnca ⚠️ 此 ID 曾被煤炭协会用过两次(PR #129/#141,均已删除),这次是 认证认可监管委 cnca.gov.cn — 不同机构,ID 无冲突 ✅
  • 其余 4 个无重复 ✅

② Schema ✅

无敏感词 / 无 Langfuse / PR 描述干净

③ 内容审查

  • china-acftu(全国总工会)👷 — 工会/劳动
  • china-nanchang-stats(南昌统计)📊
  • china-fuzhou-stats(福州统计)📊
  • china-gas-association(燃气协会)🔥 — 能源
  • china-cnca(认证认可监管委)📋 — 质量认证

⚠️ cnca 历史上两次用于煤炭但本次确认是 cnca.gov.cn(政府机构),合法。
≥5 源需双审。Pending URL 验证 + 墨子二审。

@firstdata-dev firstdata-dev merged commit e15ae04 into main Apr 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants