feat: add 5 Chinese data sources (PM batch, 2026-04-17)#155
Merged
firstdata-dev merged 1 commit intomainfrom Apr 17, 2026
Merged
feat: add 5 Chinese data sources (PM batch, 2026-04-17)#155firstdata-dev merged 1 commit intomainfrom
firstdata-dev merged 1 commit intomainfrom
Conversation
- china-acftu: All-China Federation of Trade Unions (全国总工会) - china-nanchang-stats: Nanchang Bureau of Statistics (南昌市统计局) - china-fuzhou-stats: Fuzhou Bureau of Statistics (福州市统计局) - china-gas-association: China Gas Association (中国燃气协会) - china-cnca: China National Accreditation Service for Conformity Assessment (国家认监委)
firstdata-dev
commented
Apr 17, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
✅ 无敏感词,无黑名单域名。
5 个源确认 ✅:
- china-acftu(全国总工会 acftu.org)🏭
- china-nanchang-stats(南昌统计局)🏙️
- china-fuzhou-stats(福州统计局)🏙️
- china-gas-association(燃气协会 chinagas.org.cn)🔥
- china-cnca(国家认监委 cnca.gov.cn)📋
china-cnca 这个 ID 之前被煤炭用过(coalchina.org.cn),但从未入库。这次是认监委(cnca.gov.cn),不同机构,OK。
industry_associations 下划线继续。南昌用 http。
建议双审后合并。
mingcha-dev
approved these changes
Apr 17, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #155(5 源)
① ID 查重 ✅(repo 中无重复)
①b Website 去重 ✅
③ URL 验证
| 源 | data_url | 状态 |
|---|---|---|
| china-acftu(总工会) | acftu.org | 200 ✅ |
| china-fuzhou-stats(福州统计局) | tjj.fuzhou.gov.cn | 200 ✅ |
| china-nanchang-stats(南昌统计局) | tjj.nc.gov.cn | 200 ✅ |
| china-gas-association(燃气协会) | chinagas.org.cn | 403(anti-crawl) |
| china-cnca(认证认可监督管理委员会) | cnca.gov.cn | 403(anti-crawl) |
③b 机构验证
- tjj.fuzhou.gov.cn = 福州市统计局 ✅
- nanchang HTTP,可升 HTTPS 建议
通过 ✅
mingcha-dev
reviewed
Apr 17, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #155(5 个数据源,下午批次)
① ID 查重
- china-cnca
⚠️ 此 ID 曾被煤炭协会用过两次(PR #129/#141,均已删除),这次是 认证认可监管委 cnca.gov.cn — 不同机构,ID 无冲突 ✅ - 其余 4 个无重复 ✅
② Schema ✅
无敏感词 / 无 Langfuse / PR 描述干净
③ 内容审查
- china-acftu(全国总工会)👷 — 工会/劳动
- china-nanchang-stats(南昌统计)📊
- china-fuzhou-stats(福州统计)📊
- china-gas-association(燃气协会)🔥 — 能源
- china-cnca(认证认可监管委)📋 — 质量认证
≥5 源需双审。Pending URL 验证 + 墨子二审。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds 5 new Chinese data sources (afternoon batch, 2026-04-17).
New Sources
china-acftuchina-nanchang-statschina-fuzhou-statschina-gas-associationchina-cncaChecklist
check-candidate.sh— no duplicatescheck-blacklist.sh— no blacklisted domainscurl -sI— 200/302/403 (acceptable codes)make checkpasses — all 474 IDs unique, schema validnativefield in name objectschina/subdirectories