No that’s not it. My address is unique to the bank, full headers & path match up with other mail from them, and the means to reach them back correct (yes I examine every character for imposters using od -c
).
No that’s not it. My address is unique to the bank, full headers & path match up with other mail from them, and the means to reach them back correct (yes I examine every character for imposters using od -c
).
Can you explain why they would want to anonymise the tracker pixels? Doesn’t that defeat the purpose?
In principle the ideal archive would contain the JavaScript for forensic (and similar) use cases, as there is both a document (HTML) and an app (JS) involved. But then we would want the choice whether to run the app (or at least inspect it), while also having the option to offline faithfully restore the original rendering. You seem to imply that saving JS is an option. I wonder if you choose to save the JS, does it then save the stock skeleton of the HTML, or the result in that case?
wget has a --load-cookies file
option. It wants the original Netscape cookie file format. Depending on your GUI browser you may have to convert it. I recall in one case I had to parse the session ID out of a cookie file then build the expected format around it. I don’t recall the circumstances.
Another problem: some anti-bot mechanisms crudely look at user-agent headers and block curl attempts on that basis alone.
(edit) when cookies are not an issue, wkhtmltopdf
is a good way to get a PDF of a webpage. So you could have a script do a wget
to get the HTML faithfully, and wkhtmltopdf
to get a PDF, then pdfattach
to put the HTML inside the PDF.
(edit2) It’s worth noting there is a project called curl-impersonate
which makes curl look more like a GUI browser to get more equal treatment. I think they go as far as adding a javascript engine or something.
It’s perhaps the best way for someone that has a good handle on it. Docs say it “sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.” So you would need to tune it so that it’s not grabbing objects that are irrelevent to the view, and probably exclude some file types like videos and audio. If you get a well-tuned command worked out, that would be quite useful. But I do see a couple shortcomings nonetheless:
But those issues aside I like the fact that wget does not rely on a plugin.
The other thing is, what about JavaScript? JS changes the presentation.
Markdown is probably ideal when saving an article, like a news story. It might even be quite useful to get it into a Gemini-compatible language. But what if you are saving the receipt for a purchase? A tax auditor would suspect shenanigans. So the idea with archival is generally to closely (faithfully) preserve the doc.
IIUC you are referring to this extension, which is Firefox-only (likeunlike the save page WE, which has a Chromium version).
Indeed the beauty of ZIP is stability. But the contents are not. HTML changes so rapidly, I bet if I unzip an old MAFF file it would not have stood the test of time well. That’s why I like the PDF wrapper. Nonetheless, this WebScrapBook could stand in place of the MHTML from the save page WE extension. In fact, save page WE usually fails to save all objects for some reason. So WebScrapBook is probably more complete.
(edit) Apparently webscrapbook gives a choice between htz and maff. I like that it timestamps the content, which is a good idea for archived docs.
(edit2) Do you know what happens with JavaScript? I think JS can be quite disruptive to archival. If webscrapbook saves the JS, it’s saving an app, in effect, and that language changes. The JS also may depend on being able to access the web, which makes a shitshow of archival because obviously you must be online and all the same external URLs must still be reachable. OTOH, saving the JS is probably desirable if doing the hybrid PDF save because the PDF version would always contain the static result, not the JS. Yet the JS could still be useful to have a copy of.
(edit3) I installed webscrapbook but it had no effect. Right-clicking does not give any new functions.
Yeah I’m with you… it was more of an attempt at humor. Although if you search around it’s actually common for people to ask how to check if their spouse is on dating sites… which may be inspired by the whole Ashley Madison databreach.
Does pdfinfo give any indication of the application used to create the document?
Oracle Documaker PDF Driver
PDF version: 1.3
If it chokes on the Java bit up front, can you extract just the PDF from the file and look at that?
Not sure how to do that but I did just try pdfimages -all
which was not useful since it’s a vector PDF. pdfdetach -list
shows 0 attachments. It just occurred to me that pdftocairo
could be useful as far as a CLI way to neuter the doc and make it useable, but that’s a kind of a lossy meat-grinder option that doesn’t help with analysis.
You might also dig through the PDF a bit using Dider Stevens 's Tools,
Thanks for the tip. I might have to look into that. No readme… I guess this is a /use the source, Luke/ scenario. (edit: found this).
I appreciate all the tips. I might be tempted to dig into some of those options.
Your assertion that the document is malicious without any evidence is what I’m concerned about.
I did not assert malice. I asked questions. I’m open to evidence proving or disproving malice.
At some point you have to decide to trust someone. The comment above gave you reason to trust that the document was in a standard, non-malicious format. But you outright rejected their advice in a hostile tone. You base your hostility on a youtube video.
There was too much uncertainty there to inspire trust. Getoffmylan had no idea why the data was organised as serialised java.
You should read the essay “on trusting trust” and then make a decision on whether you are going to participate in digital society or live under a bridge with a tinfoil hat.
I’ll need a more direct reference because that phrase gives copious references. Do you mean this study? Judging from the abstract:
To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.
I seem to have received software pretending to be a document. Trust would naturally not be a sensible reaction to that. In the infosec discipline we would be incompetent fools to loosely trust whatever comes at us. We make it a point to avoid trust and when trust cannot be avoided we seek justfiication for trust. We have a zero-trust principle. We also have the rule of leaste privilige which means not to extend trust/permissions where it’s not necessary for the mission. Why would I trust a PDF when I can take steps to access the PDF in a way that does not need excessive trust?
The masses (security naive folks) operate in the reverse-- they trust by default and look for reasons to distrust. That’s not wise.
In Canada, and elsewhere, insurance companies know everything about you before you even apply, and it’s likely true elsewhere too.
When you move, how do they find out if you don’t tell them? Tracking would be one way.
Privacy is about control. When you call it paranoia, the concept of agency has escaped you. If you have privacy, you can choose what you disclose. What would be good rationale for giving up control?
Even if they don’t have personally identifiable information, you’ll be in a data bucket with your neighbours, with risk profiles based on neighbourhood, items being insuring, claim rates for people with similar profiles, etc. Very likely every interaction you have with them has been going into a LLM even prior to the advent of ChatGPT, and they will have scored those interactions against a model.
If we assume that’s true, what do you gain by giving them more solid data to reinforce surreptitious snooping? You can’t control everything but It’s not in your interest to sacrifice control for nothing.
But what you will end up doing instead is triggering fraudulent behaviour flags. There’s something called “address fraud”, where people go out of their way to disguise their location, because some lower risk address has better rates or whatever.
Indeed for some types of insurance policies the insurer has a legitimate need to know where you reside. But that’s the insurer’s problem. This does not rationalize a consumer who recklessly feeds surreptitious surveillance. Street wise consumers protect themselves of surveillance. Of course they can (and should) disclose their new address if they move via proper channels.
Why? Because someone might take a vacation somewhere and interact from another state. How long is a vacation? It’s for the consumer to declare where they intend to live, e.g. via “declaration of domicile”. Insurance companies will harrass people if their intel has an inconsistency. Where is that trust you were talking about? There is no reciprocity here.
When you do everything you can to scrub your location, this itself is a signal that you are operating as a highly paranoid individual and that might put you in a bucket.
Sure, you could end up in that bucket if you are in a strong minority of street wise consumers. If the insurer wants to waste their time chasing false positives, the time waste is on them. I would rather laugh at that than join the street unwise club that makes the street wise consumers stand out more.
It’s interesting to note that some research “discovered thousands of vulnerabilities in 693 banking apps, which indicates these apps are not as secure as we expected.”
Don’t Canadian insurance companies want to know where their customers are? Or are the Canadian privacy safeguards good on this?
In the US, Europe (despite the GDPR), and other places, banks and insurance companies snoop on their customers to track their whereabouts as a normal common way of doing business. They insert surreptitious tracker pixels in email to not only track the fact that you read their msg but also when you read the msg and your IP (which gives whereabouts). If they suspect you are not where they expect you to be, they take action. They modify your policy. It’s perfectly legal in the US to use sneaky underhanded tracking techniques rather than the transparent mechanism described in RFC 2298. If your suppliers are using RFC 2298 and not involuntary tracking mechanisms, lucky you.
You’re kind of freaking out about nothing.
I highly recommend Youtube video l6eaiBIQH8k
, if you can track it down. You seem to have no general idea about PDF security problems.
And I’m not sure why an application would output a pdf this way. But there’s nothing harmful going on.
If you can’t explain it, then you don’t understand it. Thus you don’t have answers.
It’s a bad practice to just open a PDF you did not produce without safeguards. Shame on me for doing it… I got sloppy but it won’t happen again.
Not sure if this is relevant, but service manuals for cars older than 2014 can be found here: charm.li (no cost and enshification-free).
deleted by creator
I should also add that some people come for asylum but they do not follow the legal process because they are reasonably concerned that the process will fail to protect them (especially if they entered under the Trump regime). If someone enters without filing then gets targeted (e.g. a hospital rats them out), and only then claim asylum, I don’t know what happens but obviously we need the process is competent about separating the genuine cases from the rest. I suppose that’s the scenario you are referring to.
Asylum is a legal process. If they follow that process (which begins with claiming asylum), then of course they cease to be illegal immigrants throughout the process.
In fact, borderline human rights compromise is actually a good incentive for people to leave. Would perhaps be good for the country if those in Texas who respect human rights would move from Texas to Pennsylvania for a human rights upgrade (where also the death penalty was repealed).
But I doubt your statement is accurate considering inbound refugees are fleeing from even worse conditions w.r.t. human rights. Refugees still technically have their human right to access emergency medical treatment, they just risk getting harassed and tagged for deportation.
A month ago you would have been wrong. But indeed apparently this just changed:
“Election bets were approved legally just weeks ago, as the 2024 race headed into its home sprint.”
I did not think of the marketing angle – although even then, knowing the times that each individual opens their mail and their location has value for personalized marketing.
We are talking about banks in the case at hand. It’s unclear how many people have not come to the realization that bankers are now doing the job of cops. KYC/AML. In this particular sector, anonymization is unlikely. Banks have no limits on their snooping. They have a blank check and no consequences for overcollection. No restraint. When they get breached, they just sign people up for credit monitoring and any overcollection has the immunity of KYC law.
At best, perhaps a marketing division would choose some canned bulk mailing service which happens to give them low resolution on engagement. But even that’s a stretch because anyone in the marketing business also wants to market their own service as making the most of data collection.