ciferecaNinjo

ciferecaNinjo@fedia.io · 10 months ago

As far as we know, Google is not giving up any data. The crawler still must store a copy of the text for the index. The only certainty we have is that Google is no longer sharing it.

ciferecaNinjo@fedia.io · edit-2 10 months ago

Here’s the heart of the not-so-obvious problem:

Websites treat the Google crawler like a 1st class citizen. Paywalls give Google unpaid junk-free access. Then Google search results direct people to a website that treats humans differently (worse). So Google users are led to sites they cannot access. The heart of the problem is access inequality. Google effectively serves to refer people to sites that are not publicly accessible.

I do not want to see search results I cannot access. Google cache was the equalizer that neutralizes that problem. Now that problem is back in our face.

ciferecaNinjo@fedia.io · edit-2 10 months ago

From the article:

“was meant for helping people access pages when way back, you often couldn’t depend on a page loading. These days, things have greatly improved. So, it was decided to retire it.” (emphasis added)

Bullshit! The web gets increasingly enshitified and content is less accessible every day.

For now, you can still build your own cache links even without the button, just by going to “https://webcache.googleusercontent.com/search?q=cache:” plus a website URL, or by typing “cache:” plus a URL into Google Search.

You can also use 12ft.io.

Cached links were great if the website was down or quickly changed, but they also gave some insight over the years about how the “Google Bot” web crawler views the web. … A lot of Google Bot details are shrouded in secrecy to hide from SEO spammers, but you could learn a lot by investigating what cached pages look like.

Okay, so there’s a more plausible theory about the real reason for this move. Google may be trying to increase the secrecy of how its crawler functions.

The pages aren’t necessarily rendered like how you would expect.

More importantly, they don’t render the way authors expect. And that’s a fucking good thing! It’s how caching helps give us some escape from enshification. From the 12ft.io faq:

“Prepend 12ft.io/ to the URL webpage, and we’ll try our best to remove the popups, ads, and other visual distractions.”

It also circumvents #paywalls. No doubt there must be legal pressure on Google from angry website owners who want to force their content to come with garbage.

The death of cached sites will mean the Internet Archive has a larger burden of archiving and tracking changes on the world’s webpages.

The possibly good news is that Google’s role shrinks a bit. Any Google shrinkage is a good outcome overall. But there is a concerning relationship between archive.org and Cloudflare. I depend heavily on archive.org largely because Cloudflare has broken ~25% of the web. The day #InternetArchive becomes Cloudflared itself, we’re fucked.

We need several non-profits to archive the web in parallel redundancy with archive.org.

ciferecaNinjo@fedia.io · 10 months ago

Bingo. When I read that part of the article, I felt insulted. People see the web getting increasingly enshitified and less accessible. The increased need for cached pages has justified the existence of 12ft.io.

~40% of my web access is now dependant on archive.org and 12ft.io.

So yes, Google is obviously bullshitting. Clearly there is a real reason for nixing cached pages and Google is concealing that reason.

ciferecaNinjo@fedia.io · edit-2 10 months ago

This is probably an attempt to save money on storage costs.

That’s in fact what the article claims as Google’s reason. But seems irrational. Google still needs to index websites for the search engine. So the storage is still needed since the data collection is still needed. The only difference (AFAICT) is Google is simply not sharing that data. Also, there are bigger pots of money in play than piddly storage costs.

ciferecaNinjo@fedia.io · edit-2 11 months ago

You were given plenty of references. You can verify it yourself if you want to get a clue – or continue to spread misinfo to the contrary. You are disservicing your users and the fedi by maintaining patronage to the privacy-abusing corp.

If you truly don’t understand the problems with Cloudflare, why not embrace transparency and inform people who visit your site that CF is used and that CF sees all their traffic despite the padlock? If you are proud of this, why conceal it?

ciferecaNinjo@fedia.io · edit-2 11 months ago

Not exactly. !showerthoughts@lemmy.world was a poor choice, as is:

!showerthoughts@zerobytes.monster ← Cloudflare
!showerthoughts@sh.itjust.works ← Cloudflare
!showerthoughts@lemmy.ca ← Cloudflare
!showerthoughts@lemm.ee ← Cloudflare
!hotshowerthoughts@x69.org ← Cloudflare, and possibly irrelevant
!showerthoughts@lemmy.ml ← not CF, but copious political baggage, abusive moderation & centralized by disproportionate size

They’re all shit & the OP’s own account is limited to creating a new community on #lemmyWorld. !showerthoughts@lemmy.ml would be the lesser of evils but the best move would be create an acct on a digital rights-respecting instance that allows community creations and then create showerthoughts community there.

(EDIT) !showerThoughts@fedia.io should address these issues.

ciferecaNinjo@fedia.io · edit-2 11 months ago

Normal users don’t have these issues.

That’s not true. Cloudflare marginalizes both normal users and street-wise users. In particular:

users whose ISP uses CGNAT to distribute a limited range of IPv4 addresses (this generally impacts poor people in impoverished regions)
the Tor community
VPN users
users of public libraries, and generally networks where IP addresses are shared
privacy enthusiasts who will not disclose ~25% of their web traffic to one single corporation in a country without privacy safeguards
blind people who disable images in their browsers (which triggers false positives for robots, as scripts are generally not interested in images either)
the permacomputing community and people on limited internet connections, who also disable browser images to reduce bandwidth which makes them appear as bots
people who actually run bots – Cloudflare is outspokenly anti-robot and treats beneficial bots the same as malicious bots
…

There are likely more oppressed groups beyond that because there is no transparency with Cloudflare.

ciferecaNinjo@fedia.io · edit-2 11 months ago

It’s an abuse of the fediverse and antithetical to #decentralization to use Cloudflare. And ironically your comment comes in response to broken functionality manifesting from links to exclusive venues appearing in an openly public forum.

ciferecaNinjo@fedia.io · edit-2 11 months ago

“Petty” for not supporting the elitist exclusivity that you support? Cloudflare blocks impoverished communities whose ISPs use CGNAT because they cannot afford an IPv4 for everyone. Shame on CF pushers and shame on you for supporting marginalization by giant corps while backing privacy abuse.

ciferecaNinjo@fedia.io · edit-2 11 months ago

And cf also allows you to block and report child porn

That’s been tried. When someone reported CP to Cloudflare, CF demanded the identity of the whiste blower then doxxed them to the offending CF user, who then published the whistle blower’s identity so their users could retaliate. When the CEO (Matthew Prince) was confronted about this, his reply was that the whistle blowers “should have used fake names”. Then this company you support had the nerve to claim to have a privacy pledge: “[A]ny personal information you provide to us is just that: personal and private.”

Also cf is about the only way to make federation affordable and safe. (emphasis mine)

Forcing children to reveal their residential IP addresses to the fedi whereby any interested person (read: child preditors) can derive their approximate location – do you really think that’s a good idea for safety?

What are you even thinking? It most certainly is not safe to expose 20%+ of everyone’s traffic to a single corporation.

ciferecaNinjo@fedia.io · edit-2 11 months ago

#digitalExclusion

Shame this is posted on a centralized Cloudflare instance, which causes problems for people using Tor,VPNs,CGNAT,etc:

ciferecaNinjo@fedia.io · edit-2 1 year ago

Not sure what Grafana is but I can’t even visit the site because they block Tor (403). Gotta love how easy it is to see-and-avoid some privacy-hostile venues. If you were using Tor you might not have wasted 1 minute with that site.

ciferecaNinjo@fedia.io · 1 year ago

I can’t view the pic. FYI, #imgur is itself user-hostile. Specifically, it’s of the tor-hostile variety. Sometimes it works but often it kicks Tor users to the curb. Doesn’t sh.itjust.works support images? If not, you might be on a user-hostile platform ;)

ciferecaNinjo@fedia.io · 1 year ago

Gender is somewhat relevant here-- according to my women studies course in uni. When women are describing a problem, they don’t usually want solutions. They want support, understanding, & sympathy, contrary to the typical male response which is to give advice & propose solutions, which then has a good chance of ending badly.

ciferecaNinjo@fedia.io · edit-2 1 year ago

And IIRC, license plates only need to be censored if bad behavior is demonstrated. Notice that the car to the left which was correctly parked has an exposed license plate.

What baffles me is that the plate number is only meaningful to law enforcement. The public does not get access to the records associated with a plate number. I see no reason to hide the info from law enforcement. The evidence may be too low of a standard to be usable, but so be it.

ciferecaNinjo@fedia.io · 1 year ago

When that happens, I register on whatever forum it was where someone said that just to say (necropost if needed) that I had the same question, searched it, and the search results brought me here where an asshole is saying to search it.