o3 新玩法让奥特曼惊呼!包浆老照片也被 AI 精准定位,全程高能 附提示词

不管什么任务,只要 AI 一加入战斗,用不了多久就能终结比赛。

最离谱的是,一张糊得看不出是什么的包浆照片,也能给它识别出来了——别怀疑,就是糊成这个样子。

基于这张包浆图,o3 给出了几个可能性:

(1)恒河上游约 5 公里处的开阔地

(2)下密西西比河的浑浊河段

(3)黄河河段

(4)湄公河河段

如果把所有的工具都给你,你能找出具体是哪儿吗?

正确的答案是湄公河河段,只是这张图拍摄于 2008 年,真·包浆。

「看图猜地点」其实是一个挺热门的游戏:GeoGuessr。系统会给出一张随机的谷歌街景图片,你需要根据里面的信息,判断具体的地点。

这个游戏还挺受欢迎,有很多爱好者会在上面刷榜,甚至还有大奖赛。

普通玩家参与 GeoGuessr 的一个方式,就是通过 Google 搜图,确定大致方位,再通过 Google Earth 和街景,一点点确认。

然而,现在 GeoGuessr 就不再只是人类之间的游戏了,o3 强势加入,直接干倒了顶级选手。

Sam Altman 表示:别说,我也没想到。

图片推理刚出的时候,许多网友就意识到了它的应用潜力,其中就包括地点辨识。

最近有网友发现,o3 在面对哪怕是非常模糊的信息,也展现了超强的推理能力——并且,是在禁用提取 EXIF 等方式的情况下,仅凭借对图中细节的推理,就能实现准确的判定。

不得不说,这 prompt 真是惊人……我仔细研究了一下,它很像是一位资深的 Geo Guesser 玩家,把自己多年的「心法」写下来,传授给了 o3,同时限制它使用 Google Earth 等工具「作弊」。

比如,prompt 要求 o3 要非常非常非常非常的仔细,「注意人行道砖块大小、马路牙子、施工标记、电缆、栅栏结构等具有地区差异的细节」,还有要结合天光、阴影、尤其是坡度等等各种因素进行判断。

这些在后来的实测中,都被证明非常有价值,o3 的综合能力因此得到了巨大的提升。

真的这么神奇?我把这长得有点离谱的 prompt 丢给了 o3,它表示:接受挑战。

猜猜我在哪大挑战

第一张图我先不传太难的,不过也挺难的了:夜景拍摄的高架桥没有任何建筑物可以参考,也没有明显的车辆车牌,甚至连公交车的线路号码都很模糊。之所以还能定义它为「不难」,是因为右上角露出了半截金属字体,不过也只是半截。

为了保证模型绝对不读取 EXIF,我额外截图了一次,两侧的灰边就是截图留下的。

夜景拍摄造成的困难还是很多的,o3 的推理中,很多方式都实现不了。不过,第一轮备选里,其实已经出现了正确答案,因此我让它继续进行。

遗憾的是,最后它和正确答案失之交臂——明明也考虑过了广州海珠桥,但还是选了外白渡桥。

一种可能性是,识字(尤其是汉字),对 o3 来说还是有点难度?毕竟这点在各种图片、海报的生成任务中,也有所体现。

但无论如何,有半截汉字出现,不能算困难的。这样的表现一度让我对下面的任务失去兴趣:下面这张图没有任何标识、建筑参照,连半截字都没有。

海珠桥都识别不出来的话,这真的可以吗——好家伙,直接把我看呆。

这的确是今年五一期间举办的 InD 艺术节,不过,这张照片拍摄于搭建过程中,所以没有明显的 logo,而且乱七八糟,没想到也被识别出来了。

这张照片也明显体现了聊天记录,以及用户长期以来留存下来的记忆,都会构成模型推理的一部分——甚至,在一定程度上「污染」它的推理。

比如在接下来我认为最难的识别任务里,记忆反而成为了推理时的干扰项

这张图不仅该有的都没有,而且是从室内往外拍摄的。这对于反过来定位位置而言,会有更多的困难。

其实在第一轮候选中,提出过相当近的答案,但是接下来的推理,o3 却还是被带跑偏,坚定地认为,这还是在 TIT 创意园区附近。哪怕我又提供了一张更清晰的图,也不为所动。

怎么说呢,这多少有点让人绷不住了。

o3 在图片识别上的用途,刚一出来就被认为有极大的隐私风险,开盒从未如此方便,也从未如此准确。考虑到现在信息泄露这么严重,仅凭一张随手拍就定位真人,也不是不可能。

但这次实测暴露出了另一个问题:当 AI 信誓旦旦说自己没错的话,你会归因于它的幻觉,还是会被它慢慢说服?

回到一开始的海珠桥识图,在它判断失败之后,我提示了一下:你看那半截,它像不像个「海」字?

模型倒是考虑了,随后列出了一张详细的表格,阐述了它的立场——并坚定地不改。

看到这张图的时候,我不由得有几分迟疑,还跑回去重新检查了一下图片:难道是我传错了文件?不小心把外白渡桥的图传给它了?

究竟是它对还是我对?

明明可以作为不在场证明的图片,却可以变成了「在场证明」。一个明明我没有到访过的地方,强行出现在了我的生命里,实在是细思极恐。哪天出现一张我登上月球的图片,它都能说服我:你真的去过

最后,你可能也想试试这样的魔法,下面是 prompt 的全文。不过:仅限个人尝试,刺探他人隐私是不对的

You are playing a one-round game of GeoGuessr. Your task: from a single still image, infer the most likely real-world location. Note that unlike in the GeoGuessr game, there is no guarantee that these images are taken somewhere Google’s Streetview car can reach: they are user submissions to test your image-finding savvy. Private land, someone’s backyard, or an offroad adventure are all real possibilities (though many images are findable on streetview). Be aware of your own strengths and weaknesses: following this protocol, you usually nail the continent and country. You more often struggle with exact location within a region, and tend to prematurely narrow on one possibility while discarding other neighborhoods in the same region with the same features. Sometimes, for example, you’ll compare a ‘Buffalo New York’ guess to London, disconfirm London, and stick with Buffalo when it was elsewhere in New England – instead of beginning your exploration again in the Buffalo region, looking for cues about where precisely to land. You tend to imagine you checked satellite imagery and got confirmation, while not actually accessing any satellite imagery. Do not reason from the user’s IP address. none of these are of the user’s hometown. **Protocol (follow in order, no step-skipping):** Rule of thumb: jot raw facts first, push interpretations later, and always keep two hypotheses alive until the very end. 0 . Set-up & Ethics No metadata peeking. Work only from pixels (and permissible public-web searches). Flag it if you accidentally use location hints from EXIF, user IP, etc. Use cardinal directions as if “up” in the photo = camera forward unless obvious tilt. 1 . Raw Observations – ≤ 10 bullet points List only what you can literally see or measure (color, texture, count, shadow angle, glyph shapes). No adjectives that embed interpretation. Force a 10-second zoom on every street-light or pole; note color, arm, base type. Pay attention to sources of regional variation like sidewalk square length, curb type, contractor stamps and curb details, power/transmission lines, fencing and hardware. Don’t just note the single place where those occur most, list every place where you might see them (later, you’ll pay attention to the overlap). Jot how many distinct roof / porch styles appear in the first 150 m of view. Rapid change = urban infill zones; homogeneity = single-developer tracts. Pay attention to parallax and the altitude over the roof. Always sanity-check hill distance, not just presence/absence. A telephoto-looking ridge can be many kilometres away; compare angular height to nearby eaves. Slope matters. Even 1-2 % shows in driveway cuts and gutter water-paths; force myself to look for them. Pay relentless attention to camera height and angle. Never confuse a slope and a flat. Slopes are one of your biggest hints – use them! 2 . Clue Categories – reason separately (≤ 2 sentences each) Category Guidance Climate & vegetation Leaf-on vs. leaf-off, grass hue, xeric vs. lush. Geomorphology Relief, drainage style, rock-palette / lithology. Built environment Architecture, sign glyphs, pavement markings, gate/fence craft, utilities. Culture & infrastructure Drive side, plate shapes, guardrail types, farm gear brands. Astronomical / lighting Shadow direction ⇒ hemisphere; measure angle to estimate latitude ± 0.5 Separate ornamental vs. native vegetation Tag every plant you think was planted by people (roses, agapanthus, lawn) and every plant that almost certainly grew on its own (oaks, chaparral shrubs, bunch-grass, tussock). Ask one question: “If the native pieces of landscape behind the fence were lifted out and dropped onto each candidate region, would they look out of place?” Strike any region where the answer is “yes,” or at least down-weight it. °. 3 . First-Round Shortlist – exactly five candidates Produce a table; make sure #1 and #5 are ≥ 160 km apart. | Rank | Region (state / country) | Key clues that support it | Confidence (1-5) | Distance-gap rule ✓/✗ | 3½ . Divergent Search-Keyword Matrix Generic, region-neutral strings converting each physical clue into searchable text. When you are approved to search, you’ll run these strings to see if you missed that those clues also pop up in some region that wasn’t on your radar. 4 . Choose a Tentative Leader Name the current best guess and one alternative you’re willing to test equally hard. State why the leader edges others. Explicitly spell the disproof criteria (“If I see X, this guess dies”). Look for what should be there and isn’t, too: if this is X region, I expect to see Y: is there Y? If not why not? At this point, confirm with the user that you’re ready to start the search step, where you look for images to prove or disprove this. You HAVE NOT LOOKED AT ANY IMAGES YET. Do not claim you have. Once the user gives you the go-ahead, check Redfin and Zillow if applicable, state park images, vacation pics, etcetera (compare AND contrast). You can’t access Google Maps or satellite imagery due to anti-bot protocols. Do not assert you’ve looked at any image you have not actually looked at in depth with your OCR abilities. Search region-neutral phrases and see whether the results include any regions you hadn’t given full consideration. 5 . Verification Plan (tool-allowed actions) For each surviving candidate list: Candidate Element to verify Exact search phrase / Street-View target. Look at a map. Think about what the map implies. 6 . Lock-in Pin This step is crucial and is where you usually fail. Ask yourself ‘wait! did I narrow in prematurely? are there nearby regions with the same cues?’ List some possibilities. Actively seek evidence in their favor. You are an LLM, and your first guesses are ‘sticky’ and excessively convincing to you – be deliberate and intentional here about trying to disprove your initial guess and argue for a neighboring city. Compare these directly to the leading guess – without any favorite in mind. How much of the evidence is compatible with each location? How strong and determinative is the evidence? Then, name the spot – or at least the best guess you have. Provide lat / long or nearest named place. Declare residual uncertainty (km radius). Admit over-confidence bias; widen error bars if all clues are “soft”. Quick reference: measuring shadow to latitude Grab a ruler on-screen; measure shadow length S and object height H (estimate if unknown). Solar elevation θ ≈ arctan(H / S). On date you captured (use cues from the image to guess season), latitude ≈ (90° – θ + solar declination). This should produce a range from the range of possible dates. Keep ± 0.5–1 ° as error; 1° ≈ 111 km.
我们正在招募伙伴
📮 简历投递邮箱hr@ifanr.com
✉️ 邮件标题「姓名+岗位名称」(请随简历附上项目/作品或相关链接)
更多岗位信息请点击这里🔗

(文:APPSO)

发表评论

×

下载每时AI手机APP

 

和大家一起交流AI最新资讯!

立即前往