Hacker News

amwolff
Backblaze seemingly does not support files greater than 1 TB wadetregaskis.com

aosaigh15 days ago

I like the log message "BadBadBadChunkRecord". I wonder what BadChunkRecord and BadBadChunkRecord are? Is there a VeryBadBadBadChunkRecord?

I've been trying to replace all my various backups (Time Machine, Backblaze, CCC) to use a single tool (Arq - https://www.arqbackup.com/)

The Backblaze client is the next to go. To be honest, I haven't had too many issues with it, but the restore interface in particular is pretty poor and slow.

jagged-chisel15 days ago

I think it’s a triple negative. A BadChunkRecord is, well, a bad chunk record. A BadBadChunkRecord is a chunk record that’s bad at being bad, so it’s good. One can see the logical progression that leads BadBadBad… to be a bad one.

Of course, being bad at being bad could just be a different kind of bad. A good BadChunkRecord explains the problem with the chunk. A bad BadChunkRecord might have too little information to be a good BadChunkRecord. A bad BadBadChunkRecord could be an unhandled exception with no other details and the fact that a ChunkRecord is even involved is assumed and therefore questionable.

dmd15 days ago

The møøse responsible for the BadChunkRecord have been sacked.

handsclean15 days ago

We can’t neglect the possibility that BadBadBad is a term that the Backblaze code base now defines directly, not just in terms of its etymological root “bad”. Its name and definition may overlap the common word, but also carry additional meaning, like “logged and non-recoverable”.

smatija15 days ago

That's a 4-base number system. Next in progression is WorseChunkRecord, then you have BadWorseChunkRecord and so on to WorseWorseWorseChunkRecord, when you finally get to HorribleChunkRecord.

Moru15 days ago

Nah, NGU standard is bad, badbad, badbadbad, x4 bad, x5 bad and so on

Neil4415 days ago

I like the B2 CLI client, having tried various freeware backup utilities the local database always becomes unweildy due to number of files and changes being tracked. B2 somehow seems performant and needs no local db, set the retention settings via the bucket.

immibis15 days ago

B2 is a separate product (presumably sharing the same underlying storage) from backblaze's original backup product.

graemep15 days ago

This seems to be about "computer backup", not B2 though.

The documentation implies a 10TB limit for large files on B2.

Crosseye_Jack15 days ago

At least the file wasn't Michael Jackson bad (bad, bad, really, really bad).

homebrewer15 days ago

I think this problem has already been solved without increasing word length: we should just use doubleplusungood and increment from there.

mrighele14 days ago

It's a bit field with three values, encoded as a camel-cased identifier. This time three things went bad at the same time, but in less dramatic cases you may simply have a GoodBadGoodChunkRecord or a BadBadGoodChunkRecord. Needless to say, GoodGoodGoodChunkRecord denotes regular conditions.

leftnode15 days ago

Funny quip aside, thanks for bringing Arq to my attention. This looks excellent and isn't enshittified.

bluehatbrit15 days ago

I'll also vouch for Arq, I've been using it for several versions now and they're all pretty solid. The website is a bit difficult to navigate but the tool itself is solid. I use it across windows and macos machines at home and have never had an issue with it, for both backup and restore.

tzs15 days ago

I've used Arq for many years. The only thing that occasionally annoys me is that it will get an error like

> Error: /Users/tzs/Library/Biome/streams/restricted/ProactiveHarvesting.Mail/local/91257846325132: Failed to open file: Operation not permitted

but when I check that file I have no trouble opening it. I can't see anything in the permissions of it or any of the directories on the path that would permit opening it.

Then I'd have to search the net to find out what the heck that is and whether or not it is safe to add an exclusion for it or for one of the directories on the path.

I eventually figured out that before searching the net what I should do is create a new backup plan and take a look at the exclusions in that new backup plan. Often I'd then find that there is a default exclusion that covers it. (In this particular example ~/Library/Biome is excluded by default).

When they update the default exclusion list that is used for new backup plans it does not update the defaults in existing backup plans. Evidently either Biome did not exist several years ago when I made my backup plan, or it was not a source of errors and so was not in my default exclusions.

So now I occasionally create a new backup plan, copy its default exclusions, delete the new backup plan, and then compare the default exclusions with those of my backup plans to see if there is any I should add.

[deleted]15 days agocollapsed

johnbellone15 days ago

I like Bad³ChunkRecord much better.

redleader5515 days ago

I'm speculating here based on my experience working on a storage fabric product in the past.

First - it's sensible to have limits to the size of the block list. These systems usually have a metadata layer and a block layer. In order to ensure a latency SLO, some reasonable limits to the number of blocks need to be imposed. This also prevents creating files with many blocks of small sizes. More below.

Second - Usually blocks are designed to match some physical characteristics of the storage device, but 10 MB doesn't match anything. 8MB might be a block size for an SSD or a HDD, or 72MB=9*8MB might be a block size with 9:16 erasure coding, which they likely have for backup use-cases. That being said, it's likely the block size is not fixed to 10 MB and could be increased (or decreased). Whether or not the client program that does the upload is aware of that, is a different problem.

Aeolun15 days ago

This seems at the same time reasonable (no files larger than 1TB sounds fair, I don’t think anyone reads ‘no file size restrictions’ and imagines a one terabyte file), and not, because support should realize that that might have something to do with the issue and get someone technical involved.

davidt8415 days ago

I read "no file size restrictions" and assume I can create a file as large as the storage space I can afford.

What else would I assume?

If there's a 1TB limit I would expect that to be described as "create files up to 1TB in size".

eterevsky15 days ago

I would assume some limit no higher than 2^64, since all common file systems have file size limits: https://en.wikipedia.org/wiki/Comparison_of_file_systems

8organicbits15 days ago

When common file systems like NTFS have limits like "16 TiB (17.59 TB) to 8 PiB (9.007 PB)", a 1TB limit seems below what is advertised.

swiftcoder15 days ago

Undocumented limits are a classic way to blow up traffic to your support centre. Please folks, if your service has limits, document them!

michaelt15 days ago

Documenting wouldn't hurt, I'll agree.

But I doubt they're getting much support burden from this; the product targets home users, providing fixed-price single computer backups. This isn't their B3 blob storage offering.

The most bloated triple-A games out there are only ~100GB. The entirety of OpenStreetMap's data, in XML format, only reaches 147 GB [1]. The latest and greatest 670-billion-parameter LLM is only 600 GB... and they ship it as 163 x 4.3GB files [2]. Among PC gamers [3] about 45% of people have less than 1TB of hard drive space, total.

IMHO it's very unlikely they have a big support burden from this undocumented limitation.

It's clearly a bug though - trying to upload the same file in every single backup, failing every time? There's no way that's the intended behaviour.

[1] https://planet.openstreetmap.org/ [2] https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main [3] https://store.steampowered.com/hwsurvey

birdman313115 days ago

You have never played Ark I take it. I am sitting at ~700gb between Ark: Survival Evolved and Survival Ascended.

OtherShrezzing15 days ago

Documenting also helps your own engineers understand that there are limitations. The back-end team might understand that there's a 1TB limit. If the front-end team doesn't, they could cost you a tonne of bandwidth uploading 99% of a 1.001TB file to the server before it gets rejected.

madeofpalk15 days ago

If it actually wasn't supported, surely the client would know this and wouldn't attempt to continiously upload it.

Seems like there is a bug somewhere - whether artificially imposing a file size limit, or in not imposing the limit correctly.

jagged-chisel15 days ago

> … surely the client would know this

”Should” most certainly. However, often client teams don’t work so closely with backend teams. Or the client team is working from incorrect documentation. Or the client team rightfully assumes that when it reports a file size before uploading, the backend will produce an acceptable error to notify the client.

baobabKoodaa15 days ago

Oh my, this almost sounds like it just might have unintended bad effects when you lie in your marketing about what limits your service has.

mort9615 days ago

It's a back-up service, not a cloud storage service. If I have a 1TB file on my machine, and I want to back up my machine, I want that 1TB file to be backed up.

londons_explore15 days ago

And a 1tb file is pretty common if you take a backup of a 1tb disk drive into a disk image.

joseda-hg15 days ago

Not arguing your main point, but why wouldn't you compress a backup?

londons_explore15 days ago

My disk's tend to be full of films that barely compress.

mort9615 days ago

Certainly, I've regularly had single image files and tar.gz files of that magnitude on my machines which are precisely back-ups of old laptops when I get a new one etc.

dpacmittal15 days ago

Why couldn't they just say 1tb limit?

baobabKoodaa15 days ago

This is what bothers me as well. I don't think you would find a single customer who would go "what, only 1TB files?"

I guess different kinds of people are drawn to different careers, and people with loose morals, hubris, and a propensity to lie are drawn to marketing. Don't even need an incentive to lie, it's just who they are.

pbalau15 days ago

> I don't think you would find a single customer who would go "what, only 1TB files?"

I think you will find that there are people that see "limit <humongous>" and pick the "no limit" or no limit stated option.

> people with loose morals, hubris, and a propensity to lie are drawn to marketing

You are jumping the gun here.

kaivi15 days ago

These limits are not reasonable at all. You are going to curse the Backblaze or AWS S3 before you learn to never pipe the output of pg_dump into an object store like that.

atYevP15 days ago

Yev here -> we're actively looking into it, we can upload files >1TB so it's a bit tough to chase down.

atYevP15 days ago

Yev here from Backblaze -> This is something that our engineering team is taking a look at. We've been able to successfully upload files over 1TB internally, so we're trying to chase down what's triggering those error messages. From what I can gather the client should certainly upload files over 1TB, but something wonky is going on this case. We're looking at it though!

bakugo15 days ago

The Backblaze computer backup app is kind of a mess in general. The only reason I still use it is because of the unbeatable price, there are many better alternatives if cloud storage prices aren't an issue for you.

I've had many issues with it, and poking around at its logs and file structures makes it pretty obvious that it's very bad code. There's numerous different inconsistent naming conventions and log formats.

I once tried to edit the custom exclusions config file (bzexcluderules) and gave up shortly after because the format of the file is both completely nonsensical and largely undocumented - whoever implemented didn't know how to do proper wildcard matching and was likely unaware of the existence of regular expressions, so they came up with an unholy combination of properties such as ruleIsOptional="t" (no idea what this means), skipFirstCharThenStartsWith=":\Users\" (no comment on this one), contains_1 and contains_2 (why not contains_3? sky's the limit!), endsWith="\ntuser.dat" and hasFileExtension="dat" (because for some reason checking the extension with endsWith isn't enough?), etc.

southernplaces714 days ago

>there are many better alternatives if cloud storage prices aren't an issue for you.

What are a couple that you particularly recommend for this? I've been using B2 after abandoning the disaster that SpiderOak has become (was a long term user and stuck out of habit), and not exceptionally happy with B2 either.

bakugo14 days ago

Restic is my personal favorite for straightforward file backups. It's simple and well-designed, integrates with rclone meaning it supports any file transfer protocol or cloud storage service you can imagine, and has a decently large community surrounding it.

https://restic.net/

https://github.com/rubiojr/awesome-restic

southernplaces714 days ago

Thanks for getting back with an answer. Still curious thogh, to what cloud system or platform do you back up? Restic is a solution for organizing and managing the backup process, but not for the remote storage itself.

bakugo14 days ago

Literally any remote storage you can imagine works. I've personally used Hetzner storage boxes and they work fine, and their price is very good (though still not as good as Backblaze Backup if you have double digit TBs like me)

southernplaces714 days ago

I'll give Hetzner a shot. Been looking into it and as for Backblaze, I simply can't stand their exclusion-based selection system for files to back up.

actuallyalys15 days ago

The log suggests it doesn’t immediately stop upon noticing the file is over 1 TB—the second entry comes two minutes later when it’s trying to upload the 100,001st block. This makes me wonder whether 100,000 blocks is an intentional limit on file size or whether it’s some sort of internal threshold that isn’t meant to prevent uploading files of that size but is buggy. Perhaps files bigger than 1 TB have slightly different behavior by design and that behavior was broken?

Since 2 minutes isn’t long enough to upload 1 TB, it’s either looking at the blocks out of order and skipping ahead to block 100,001 for some reason or noticing that’s the first block that hasn’t been uploaded yet.

Another comment suggests 10 MB isn’t a fixed limit. In that case, I wonder if 100,000 blocks is the limit and the intention of the designers is that users with such large files would increase their block size. If so, that should be documented of course, and a scenario. Although it’s still a bit strange that it would apparently try to upload the 100,001st block and not immediately warn the user that the file size is incompatible with their block size.

tobyhinloopen15 days ago

I'm going to ask the wrong question here, but... What files are >1TB? Raw video footage?

earth-adventure15 days ago

Encrypted disk volumes from virtual machines for example.

bifftastic15 days ago

Backblaze does not back up disk images. It's a documented limitation.

I just found this out the hard way.

dataflow36015 days ago

There are certain file types that are excluded by default, but you can adjust these in BB's Prefs.

(Disclaimer: I'm a BB customer with 70TB backed up with them.)

notachatbot12315 days ago

On what kind of plan?

tobyhinloopen15 days ago

BackBlaze allows "unlimited" backups for any drive attached to your PC. So if you have >=70TB on your machine...

Hamuko15 days ago

Isn’t there just the one plan?

MurkyLabs15 days ago

no there's 2 from what I can see; there's the computer back up which is $99/year and then there's the b2 backup which is $6/TB/Month

Hamuko15 days ago

Those aren't different plans, they're different products. B2 is also not a backup product but an object storage product.

baobabKoodaa15 days ago

That can't be right. After all, "Backblaze automatically finds your photos, music, documents, and data and backs it up. So you never have to worry."

bifftastic15 days ago

Actually to be fair on BB you can override this behaviour. But you have to know...

ykonstant15 days ago

Can you share the know?

justusthane15 days ago

You just have to go into the preferences on the Backblaze client and adjust the exclusions.

Mac: https://www.backblaze.com/computer-backup/docs/configure-exc...

Windows: https://www.backblaze.com/computer-backup/docs/configure-exc...

baobabKoodaa15 days ago

Exactly. Just adjust the exclusions. See, you never have to worry. Backblaze automatically backs up all your data for you.

chlorion15 days ago

A disk image is just a file though. Does it do some sort of analysis of files and block disk images somehow?

Maybe you mean that it doesnt do image-level backups by default?

fl0id15 days ago

it doesn't do image level backups anyway. it just excludes certain file types by default, probably also cache directories as far as I remember

ktm5j15 days ago

I wrote some code a while ago that gpg-encrypted zfs snapshots and stored them in backblaze b2.. Just because they don't have software that backs up disk images for you doesn't mean you can't make that happen yourself!

TrackerFF15 days ago

With any GHz sampling system, you can rack up TB of data very fast if the system/your setup allows for it.

Now imagine how much data the Large Hadron Collider can generate in a second.

eptcyka15 days ago

Disk images, archives.

Symbiote15 days ago

I've handled scientific data sets of that size (per file).

vman8115 days ago

I'm guessing poorly configured log files?

baobabKoodaa15 days ago

VeraCrypt containers

bongripper15 days ago

[dead]

Borg315 days ago

The question is, where that limit comes from.. It sounds weird. 40bit file size record? like why?

I recently was fixing my own tool here and had to shufle some fields. I settled up at 48bit file sizes, so 256TB for single file. Should be enough for everyone ;)

spuz15 days ago

It sounds like an arbitrary limit set by some engineer without too much thought and without the incentive to document their decision. I do this sometimes as well when dealing with untrusted data. Sometimes you don't want to have to think about the knock on effects of supporting large file sizes and it's easier to just tell the client please don't do that.

2030ai15 days ago

[dead]

[deleted]15 days agocollapsed

raverbashing15 days ago

Honestly? Be a good customer. Help them help you and help yourself in the meanwhile

Having a 1TB file sucks in way more ways than just "your backup provider doesn't support it". Believe me.

aleph_minus_one15 days ago

The problem is that it is often possible to solve the problem for (your) future, but you cannot get rid of the existing large files (if only for legacy reasons).

One example: A Git repository that I created contained files that are above GitHub's file size limits. The reason was a bad design in my program. I could fix this design mistake, so that all files in the current revision of my Git repository are now "small". But I still cannot use GitHub because some old commit still contains these large files. So, I use(d) BitBucket instead of GitHub.

Symbiote15 days ago

You can rewrite the history of the repository to remove the huge files, if you wish.

Of course, all the commit identifiers change.

aleph_minus_one15 days ago

> You can rewrite the history of the repository to remove the huge files, if you wish.

These old files nevertheless contain valuable data that I want to keep. Converting them to the new, much improved format would take serious efforts.

znpy15 days ago

On the other hand this means shifting responsibility to deal with past mistakes onto somebody else.

Are you really sure you cannot rebase/edit/squash the commits in that git repositories?

Yes, commit hashes will change, but will it actually be a problem?

aleph_minus_one15 days ago

> On the other hand this means shifting responsibility to deal with past mistakes onto somebody else.

This was just one example. Let me give another one:

In some industries for audit reasons you have to keep lots of old data in case some auditor or regulator asks questions. In such a situation, the company cannot say "hey sorry, we changed our data format, so we cannot provide you the data that you asked for." - this will immediately get the company into legal trouble. Instead, while one is allowed to improve things, perfect capability to handle the old data has to be ensured.

rschiavone15 days ago

If the repository is personal one managed only by you, you can squash the commits so the large files disappear from the history.

You can do that on shared repos too but that would cause a bad headache to other maintainers.

davidt8415 days ago

Perhaps you could say something concrete, rather than vague waffle and assertions?

54235423423515 days ago

Perhaps you could say something concrete, rather than vague criticism?

davidt8415 days ago

What am I supposed to say in response to a comment conveying the message "be a good customer" with no further elaboration and the source "believe me"?

hn-front (c) 2024 voximity
source