all 28 comments

[–]Aureus 10 insightful - 3 fun10 insightful - 2 fun11 insightful - 3 fun -  (0 children)

Thanks so much! This is really important and a good guide.

[–]PuttItBack 6 insightful - 5 fun6 insightful - 4 fun7 insightful - 5 fun -  (0 children)

Wish I'd seen this last night when I was backing things up before the ban lol.

[–]holy_goat 6 insightful - 2 fun6 insightful - 1 fun7 insightful - 2 fun -  (0 children)

this could be useful for moving content over to this and/or ruqqus. keep a good backlog so more people will join.

[–]stupidmechanic 4 insightful - 3 fun4 insightful - 2 fun5 insightful - 3 fun -  (5 children)

fetch_links.py fails to run; says there's a syntax error in psaw/PushshiftAPI.py line 251

error message:

Traceback (most recent call last):

File "./fetch_links.py", line 10, in <module>

from psaw import PushshiftAPI

File "/usr/lib/python2.7/dist-packages/psaw/__init__.py", line 7, in <module>

from .PushshiftAPI import PushshiftAPI, PushshiftAPIMinimal

File "/usr/lib/python2.7/dist-packages/psaw/PushshiftAPI.py", line 251

return batch

SyntaxError: 'return' with argument inside generator

Anyone getting the same problem?

[–][deleted] 4 insightful - 3 fun4 insightful - 2 fun5 insightful - 3 fun -  (3 children)

or how about running

pip install -U pip
pip install psaw --upgrade

it looks like your line 251 is out of date? https://github.com/dmarx/psaw/blob/master/psaw/PushshiftAPI.py#L251

edit: I think that's it, you have psaw 0.0.10 from 3/10 notpsaw 0.0.12 from 3/18, hopefully they fixed this bug https://github.com/dmarx/psaw/blob/f8609bcc9dc945c2c1b0e732ac8dba25f19545fc/psaw/PushshiftAPI.py#L251

[–]stupidmechanic 4 insightful - 3 fun4 insightful - 2 fun5 insightful - 3 fun -  (2 children)

In reply to

"hmmm I haven't seen this one before, and it's not an issue with the archiver code.

what does

python --version 

output? maybe try running it with python3 like

python3 ./fetchlinks.py"

God, I'm an idiot!

python3 ./fetchlinks.py

made this error go away. Now going ahead. Need to set python3 as default interpreter instead of python2.7. Besides, I had to manually copy-paste the *.py files into my compilers, maybe that has something to do with this. Thanks man.

[–][deleted] 5 insightful - 2 fun5 insightful - 1 fun6 insightful - 2 fun -  (1 child)

cool, i was afraid it was bad advice so i deleted. im not too good at python stuff. I will add this advice to this other troubleshooting advice: https://github.com/libertysoft3/reddit-html-archiver/issues/18

[–]stupidmechanic 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (0 children)

Yes indeed, it seems to be a bad psaw installation. 0.0.10

[–]1nvar 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (0 children)

Pushshift has a mostly complete copy of the last 5 years of reddit, it's how https://redditsearch.io/ works! Thanks, archivists :-)

[–]sosorreal 3 insightful - 3 fun3 insightful - 2 fun4 insightful - 3 fun -  (2 children)

is there a way to get our Saved content from subs that are now banned?

[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (1 child)

The tool I linked doesn't interact with reddit directly, so not with that. But using the reddit API you may be able to still get ids of your saved content, and then the tool I linked could download it. So nothing easy exists.

[–]sosorreal 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (0 children)

hmm okay, thank you for the information

[–]JasonCarswell 3 insightful - 3 fun3 insightful - 2 fun4 insightful - 3 fun -  (3 children)

I thought your brain was sexy before, but now it's exponentially prodigious.

Most of it is Greek to me, but I have some questions:

  1. Is it possible to reupload old content into SaidIt/NotABug/etc that uses its own old datestamp rather than being "new"? If this were possible, that would be terrific for those who've backed up yet feel like they've lost their communities. If it's not possible then reposting it all anew could be painful for the rest of us unless there were limits. Which brings me to...

  2. Is it possible to make a bot to auto reup or repost an archive? And is it possible for that bot to skip over /s/all and/or use the old datestamp? And is this bot-post-frequency a conversation worth having in /s/SaidItBots regarding the frequency of posts for not only stuff like this but in general and/or specifically about other topics/issues. Of course it's also simply easy to leave it until it becomes an issue - if ever.

  3. Regarding the archive votes, IMO, that's interesting information that's potentially worth saving. As you import the old posts as reups or reposts I'm guessing the SaidIt votes might start from zero. It might be nice for that import-bot to make a note of the votes in the comments - or if you want to get fancy make a new "untouchable" RAI (Reddit Archive Import) vote section that displays the frozen score with a tally that can only be adjusted if more information comes from further import information (ie. if they are importing from a not-yet-banned-sub and/or a more recent archive).

  4. Is there any way to validate the archives and importing? I wouldn't expect forgeries or tampering right away, but I suspect that eventually there could be meddling if there was a motive and a way. Sad but true.

  5. Federating SaidIt is the first big step in what I believe is the most important goal - decentralization. In my limited non-tech savvy opinion, it seems to me that sharing backup(s) of the archive(s) (via torrent?) is the next most important step. I don't know how related this kind of large-scale archiving is comparative to isolated sub-archiving and/or if they can be imported in a similar manner - but it sure would be nice for anyone interested to be able to. I have no interest in creating my own server and instance unless it helps in my small way as backup, but I am very interested in helping to perpetuate archive backup torrents because I know that can help. (People are still asking me to share my old Tigole aggregations and anticipating my new ones that I haven't made in about a year.)

  6. If relevant - maybe repost on /s/DecentralizeAllThings and/or add to that wiki?

  7. There is no 7

Great work!

[–][deleted] 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 2 fun -  (2 children)

Well hello to you too good looking. Compliments get replies!

  1. no not really. you've gotta be careful with the import order to at least keep it somewhat chronologically sorted. the original date could be added to the post title or comment.
  2. yes bots could automatically post here. the sub in question could hide from all, or even be private temporarily if a ton needed importing and you wanted to not "take over the site"
  3. yeah you might just have to add the score in the comment, but the real saidit score would be the default 1 upvote. pulling this off without having to touch the saidit codebase makes things way easier.
  4. you could do some basic checks, but the pushshift archive for reddit data is what it is. using other sources, you can always make a checksum of the content on both systems, to ensure it's the same.
  5. we should release all saidit data. m7 is into it. it's been low priority, and we'd need to inform everybody and update a privacy policy
  6. one fun bit of decentralization news is that Lemmy has pulled off a real reddit federation. those guys are far left/commies but there's already a non far left instance up. real progress in reddit world!

[–]JasonCarswell 2 insightful - 3 fun2 insightful - 2 fun3 insightful - 3 fun -  (1 child)

"Good looking"? Maybe once, a dozen years ago. Today I took the first selfie in a dozen years or more, [for reasons] perhaps as a before if there's an after. It's on a camera I have to plug in to see if any are actually good enough to share - so "good looking" remains to be seen - or not.

I didn't expect all answered, nor so quickly, but am glad you did, thanks. Regarding several of them, if demand calls for it, it might be worth supervising any bot development to be sure it's done properly and including the dates, votes, and whatever other metadata might be worth keeping as well as to be sure it doesn't jam up your system. I'm not a Redditor but I know there's some value there, and it seems some folks migrating are keen on it.

4) It occurred to me that there could be some fun creative reasons to forge a thread, though you wouldn't need to go so far as to import it. Specifically, 2 primo examples come to mind - the brilliant epic https://en.wikipedia.org/wiki/Les_Liaisons_dangereuses and clever and beautiful https://en.wikipedia.org/wiki/Griffin_and_Sabine both written as epistolary novels (correspondence letters between people explaining events and thus laying out the story for the reader).

5) I can easily understand why you may have been distracted by everything since the dawn of SaidIt, but IMO, to weaken the target on your backs and to strengthen the future of Saidit (and freedom for humanity) it seems like this one should be kicked up to be among the top priorities - especially since the 2020 gear shift. Who knows what other nonsense bullshit they may be planning. Things are bad but it seems certain they are going to get worse and stay worse forever unless they are outed with something decentralized they can't defeat. I hate to say it but if you or M7 were to be grabbed for some stupid mask infraction or whatever, intentionally or accidentally silencing SaidIt or leaving it rudderless, we'd have no way of knowing or helping or whatever. (I hope you guys have some backup plans, deadman switches, virtual emergency flares, etc.)

6) That is fun. If I recall, Lemmy had an epic fail data loss at one point? What's the other instance called? (Today I saw your Reddit alternatives list got censored.) I don't know what "real Reddit federation" means. I though you had one. Can they merge? This is exciting (I think). I read the federation news a few weeks ago (congrats) and meant to add it to the SaidIt article on IG, but I didn't understand parts of it and wanted to get more feedback/details from you two. Then the DDoS happened, then I got locked out for several days, then my end got fubar and I took all that as a sign to finally quit my SaidIt addiction which I was slowly getting around to (by making more banners and non-discussion stuff). I'm going to try to only come back at the ends and middles of every month until my project(s) are ready to share. I'll make exceptions and return to SaidIt for 3 things: 1) work on CSS and/or banners stuff (with your help when you're not busy) 2) work on the mobile SaidIt thing you'd mentioned (whenever you've got time for it, or can explain it and leave it with me, or whatever) or 3) if you guys are interested I could design a logo for your new federation under your guidance (something for the inevitable online store).

Also, what's new regarding SaidIt the last few weeks? I feel like I missed some things (I don't recall blue checkmarks and some other newish formatting). (I'm good news-wise, and know about the purges on Reddit and YouTube from elsewhere and I've only seen a bit of it here so far. I plan to catch up a bit while I share some stuff over the next few days before I return to my own non-SaidIt-addicted new normal.) PM if you like, or not, or ignore it if there's nothing grand since the DDoS (of which I'm more than a little curious about).

[–][deleted] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

Oh, one called Prismo had a data fail. Here's the first non lemmy owned lemmy instances:

https://nobodyhasthe.biz/

https://darkhumorandmemes.com/

no one talks about lemmy cuz they're commies, but it's good stuff.

Things have been good here, it's exciting to have some new users. There's nearly been some volunteer coders and designers too!

[–]bug-in-recovery 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 2 fun -  (7 children)

Check here:

https://archive.org/details/debatealtright

This guy already archived most good subs.

[–]spezz 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (3 children)

based what happend to the r/MDE archive

edit: well oy vey that was just too much wrongthink because it just got deleted

edit 2: nah he just put the wrong url, still grab it while you can

[–]bug-in-recovery 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (1 child)

I fucked up the link, still there:

https://archive.org/details/DebateAltRight

[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (0 children)

Github will host these archives for free, so they can be easily viewed without downloading. I'm trying to spread the word.

[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (0 children)

i tested that myself, pushshift does not have much of that data, not sure what happened. edit: im an idiot, i checked mde not milliondollarextreme

[–]holy_goat 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (2 children)

archive.org sniped it real quick.... also archive.org is in some potential legal trouble at the moment (literary publishers suing them) so we can't be confident they'll stay around.

[–]spezz 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 2 fun -  (0 children)

this

see if the-eye.eu will take them

[–]bug-in-recovery 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 2 fun -  (0 children)

It's still there, I fucked up the link:

https://archive.org/details/DebateAltRight

Apparently, their URLs are case sensitive.

[–]calmbluejay 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (3 children)

I just get a blank page

[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (2 children)

a blank page for a link like this? it should at least show '[]' https://api.pushshift.io/reddit/search/submission/?subreddit=darkhumourandmemes

[–]calmbluejay 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (1 child)

This is what I get : https://i.imgur.com/lCZTsMG.png

[–][deleted] 3 insightful - 3 fun3 insightful - 2 fun4 insightful - 3 fun -  (0 children)

Okay, that looks good. Using this link is just a quick way to tell if the content you want is there in PushShift. So if the sub you are interested in returns anything in that 'data' array, like it did here, you can use the archive tool and it will work.