
Reputation
Badges 1
25 × Eureka!WackyRabbit7 This is a json representation of the entire plot (basically how plotly sees it).
What you are after is:full_json[0]['cells']['values']
Which is a list of lists (row order) in the table
OutrageousSheep60 so this should work, no?ds.upload(output_url='gs://<BUCKET>/', compression=0, chunk_size=100000000000)
Notice the chunk size is the maximum size (in bytes) per chunk, so it should basically very large
Hmm, we could add an optional test for the python version, and the fail the Task if the python version is not found. wdyt?
Hmm, interesting, why would you want that? Is this because some of the packages will fail?
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
Try to add '--network host' to the docker args on the task you are launching
CrookedWalrus33 this is odd I tested the exact same code.
I suspect something with the environment maybe?
Whats the python version / OS ? also can you send full pipe freeze?2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>
Yes this is odd, it should add the content-type of the file (for example "application/x-tar" but you are getting N...
Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init
and when you are done call Task.close
. This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent
which could help you spin those experiments on as many machines as you l...
Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?
DM me the entire log, I would assume this is something with the configuration
It will also allow you to pass them to Hydra (wither as overloaded, or directly edit the entire hydra config)
The package detection is done when running the code on your laptop, and this is when it first logs the packages and versions. Following it, what do you have on your laptop? OS/Conda/Python
Since I can't use the
torchrun
comand (from my tests, clearml won't use it on the clearm-agent), I went with the
@<1556450111259676672:profile|PlainSeaurchin97> did you check this example?
None
Notice that if you are using TB, everything you report to the TB will appear as well π
SmilingFrog76 this is not a weird mechanism at all , this is proper HPC scheduler πtrains-agent
is not actually aware of other nodes, it is responsible for launching a Task on its own hardware (with whatever configuration it was set). What can be done is to use the trains-agent
inside a 3rd party scheduler and have the scheduler allocate the node and trains-agent spin the experiment. There is a k8s example here: basically pulling jobs for the trains-server queue and pushing ...
Hmm StrangePelican34
Can you verify you call Task.init before TB is created ? (basically at the start of everything)
I was thinking such limitations will exist only for published
Published Task could not me "marked started" even when with force flag
Hi @<1561885941545570304:profile|PunyKangaroo87>
What do mean by store data locally?
Like clearml-data? I.e Dataset?
You can always use file:///root/path/folder as destination, this will store everything into the local folder, is that it?
Hi MortifiedCrow63
Sorry getting GS credentials is taking longer than expected π
Nonetheless it should not be an issue (model upload is essentially using the same StorageManager internally)
Hi RoughTiger69
unfortunately, the model was serialized with a different module structure - it was originally placed in a (root) module called
model
....
Is this like a pickle issue?
Unfortunately, this doesnβt work inside clear.ml since there is some mechanism that overrides the import mechanism using
import_bind
.
__patched_import3
What error are you getting? (meaning why isn't it working)
Basically it is the same as "report_scatter2d"
WhimsicalLion91
What would you say the use case for running an experiment with iterations
That could be loss value per iteration, or accuracy per epoch (iteration is just a name for the x-axis in a sense , this is equivalent to time series)
Make sense?
PanickyMoth78 thank you for the mock code, I can verify it reproduces the issue. It seem that for some reason (bug) when the same function is called multiple times it "collects" parents, hence the odd graph,
BTW: if you want to see exactly what is passed to the step you can press on the step's full_details, and see the hyperparameter section.
I'll make sure we fix this bug in the next RC.
SmarmySeaurchin8
When running in "dev" mode (i.e. writing the code) only packages imported directly are registered under "installed packages" , then when the agent is executing the experiment, it will update back the entire environment (including derivative packages etc.)
That said you can set detect_with_pip_freeze
to true (in trains.conf) and it will basically store the entire pip freeze.
https://github.com/allegroai/trains/blob/f8ba0495fb3af1f99732fdffbbccd2fa992934a4/docs/trains.c...
Basically when running remotely, the first argument to any configuration (whether object or string, or whatever) is ignored, right?
Correct π
Is there a planned documentation overhaul?
you mean specifically for the connect_configuration ? or in general on the connect
approach rationale ?
You're suggesting that the false is considered a string and not a bool?
The clearml-server always stores the values as strings (serializing them), the casting is done when passed back to the code in runtime. The issue here is there is actually no "way" to tell the argparser this is a boolean (basically any value that will be passed is treated as string). What I think we should do is fix the casting function so that if this is exatcly the same value we use the default value (i.e. boole...
ShortElephant92 yep, this is definitely enterprise feature π
But you can configure user/pass on the open source, even store as hasedh the passwords if you need.
Interesting!
Wouldn't Dataset (class) be a good solution ?