Jump to content
Chinese-Forums
  • Sign Up

Allow Duckduckgo spider


imron

Recommended Posts

@roddy can you allow the duckduckgo spider in robots.txt?

 

Google search is becoming more and more unbearable, and while I switched off it as my primary search engine a while back, it's the still the only major off-site search engine that appears to have access to the site.

 

Normally I'd use Duckduckgo, but when I do a site specific search, the results tell me that its spider has been disallowed so it can't find much content.

  • Like 1
Link to comment
Share on other sites

robots.txt:

User-agent: *
Disallow: /admin
Disallow: /profile
Disallow: /applications/core/interface/file/
Disallow: /notifications/options/
Disallow: /followed/
Disallow: /discover/followed-content/


User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
User-agent: Sogou web spider
User-agent: MJ12bot
User-agent: dotbot
User-agent: Exabot
User-agent: Wordpress/MU
User-agent: msrbot
User-agent: VB Project
User-agent: NaverBot
User-agent: Yeti
User-agent: moget
User-agent: ichiro
User-agent: Yandex
User-Agent: Charlotte
User-Agent: YoudaoBot
User-agent: sogou spider
User-Agent: bingbot
Disallow: /

https://help.duckduckgo.com/duckduckgo-help-pages/results/duckduckbot/

 

The specific message I get is, "We would like to show you a description here but the site won't allow us."

 

 

Link to comment
Share on other sites

1 hour ago, 889 said:

The specific message I get is, "We would like to show you a description here but the site won't allow us."

Yep that's the one I get too.  I figured it was from robots.txt, but that robots.txt doesn't look like it blocks it.

Link to comment
Share on other sites

Will take a look later. Having done some inefficient research while on mobile, it looks like DDG gets its search engine results via APIs to other indexes, one of which is Yandex, which is a historically badly-behaved Russian search engine. But I see similar on Yahoo, while Google looks fine. 

  • Thanks 1
Link to comment
Share on other sites

Ok, so...

 

Duckduckgo seems to crawl for ranking purposes, but not for indexing. For indexing, it pulls data from other sources - Bing/Yahoo (same thing now?) and Yandex. I had Bing and Yandex both blocked from waaaaaaaaaaay back. I've removed those blocks. Yandex seems to be better-behaved now. I think Bing still had access to the sitemap, so it could see urls and titles, and include them in the index, but not the content, and that's what was turning up in Duckduckgo. I've also allowed Baidu back in.

 

If I remember, once I see that's all working better I'll submit the site for a DDG !bang search. However, there's no guarantee of how quickly or how completely we get indexed. 

  • Thanks 3
Link to comment
Share on other sites

Theoretically yes, but I haven’t looked at a raw access log for maybe a decade. And there’s likely a delay between spidering and inclusion in the index, and I don’t know if DDG has real-time access to that index, and a search engine looking at a page doesn’t mean it makes it into the index, so...

Link to comment
Share on other sites

On 2/24/2020 at 12:02 PM, roddy said:

For indexing, it pulls data from other sources - Bing/Yahoo (same thing now?) and Yande

Are you saying DuckDuckGo is just metacrawling other search engines to get its search results? ? If so, then I stay with google....

 

Link to comment
Share on other sites

1 hour ago, Jan Finster said:

If so, then I stay with google....

It's not an "either/or" situation, it's "support both".  If you still use google, enabling Bing/Yahoo/DDG searches won't affect you in any way.  It will however make a big difference to people who don't use google search.

 

Personally, I can't stand the new look of the google search results page, and that was the driver to switching almost all my searches to DDG.  Previously I was about 60/40, with DDG being 60.  Now it's like 95/5.

Link to comment
Share on other sites

  • 2 months later...

That's about 30 visits a day from Bing-bot search engines now, up from 3 at the start of the year. Still in the region of 1%-2% of search engine traffic, but all to the good. Thanks for raising it. I've submitted for a !bang search, but not sure if it'll get approved or not. 

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Although... anyone using non-Google should bear in mind that Bing et al don't have so much indexed. Bing's webmaster tools are showing me 6k-8k pages indexed depending on what date in the last six months you pick (and no real upward trend). Google reports 50k pages indexed.

 

What is going up is the number of people clicking through from Bing. Hopefully as that continues it'll lead to more indexing.

 

In other spider news, Huawei seems to be desperately gorging itself on our pages as it gears up for a world without Google. Generally, it isn't making itself any friends. But our server is humming along quite nicely, it seems, so let it gorge.

Link to comment
Share on other sites

2 hours ago, roddy said:

anyone using non-Google should bear in mind that Bing et al don't have so much indexed.

This matches with my experience.  There are some well known posts/threads of mine and others on here that I can find in Google with a few choice keywords that DDG fails to pick up on (both searches limited to site:chinese-forums.com).  It's getting better, but Google often still wins out for a site search of specific content.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...