Remove this from your startup script:
#!/bin/bash
there is no need that, it actually "markes out" the entire thing
but it is not possible to write to a private channel in which the bot is added.
Is this a Slack limitation ?
Hi ContemplativePuppy11
This is really interesting point.
Maybe you can provide a pseudo class abstract of your current pipeline design, this will help in trying to understand what you are trying to achieve and how to make it easier to get there
there was a problem with index order when converting from pytorch tensor to numpy array
HealthyStarfish45 I'm assuming you are sending numpy to report_image (which makes sense) if you want to debug it, you can also test tensorboard add_image or matplotlib imshow. both will send debug images
the issue moving forward is if we restart the pod we will have to manually update that again.
Can't you map the nginx configuration file ? (making the changes persistent across pods)
Hi SlipperyDove40
plotly is about 4Mb... trains about 0.5MB what'd the breakdown of the packages ? This seems far away from 250Mb limit
I think poetry should somehow return error if toml is "empty" then we can detect it...
Hi @<1715900760333488128:profile|ScaryShrimp33>
hi everyone! I’m trying to save my model’s weights to storage. And I can’t do it.
See example here: None
or
task.update_output_model(model_path="/path/to/model.pt")
MagnificentPig49 I was not aware of jsonargparse from what I understand it's a nicer way to parse json configuration files, with argparser alike interface. Did I get that correctly?
Regrading the missing argparser, you are correct, the auto-magic is not working since jsonargparse is calling an internal ArgParser function and not the external one (hence we miss it).
The quickest fix is adding the following line before you call parse_args() :task.connect(parent_parser)
Hi @<1697419082875277312:profile|OutrageousReindeer5>
Is NetApp S3 protocol enabled or are you referring to NFS mounts?
Hi JitteryCoyote63
If you want to stop the Task, click Abort (Reset will not stop the task or restart it, it will just clear the outputs and let you edit the Task itself) I think we witnessed something like that due to DataLoaders multiprocessing issues, and I think the solution was to add 'multiprocessing_context='forkserver' to the DataLoaderhttps://github.com/allegroai/clearml/issues/207#issuecomment-702422291
Could you verify?
Hi MagnificentSeaurchin79
This means the tensorflow was not directly imported in the repository (which is odd, it might point to the auto package analysis failing to find a the package, if this is the case please let me know)
Regardless, if you need to make sure a package is listed in the requirements either import it or use.Task.add_requirements('tensorflow') or Task.add_requirements('tensorflow', '2.3.1')
Regrading the demoapp, this is just a default server that allows you to start play around with ClearML without needing to setup any of your own servers or signup
That said, I would recommend to sign up (totally free) on the community server
https://app.community.clear.ml/
Try adding this environment variable:export TRAINS_CUDA_VERSION=0
But this config should almost never need to change!
Exactly the idea 🙂
notice the password (initially random) is also fixed on your local machine, for the exact same reason
Interesting...
We could followup the .env configuration, and allow the clearml-task to add configuration files from cmd line. This will be relatively easy to add. We could expand the Environment support (that somewhat exists), and add the ability to read variables from .emv and Add them to an "hyperparemeter" section, named Environment. wdyt?
sdk.storage.cache.size.cleanup_margin_percent
Hi ReassuredTiger98
This is actually future proofing the cache mechanism and allowing it be "smarter" i.e. clean based on cache folder size instead of cache folder entries, this is currently not available
sdk.storage.cache
parameters for the agent?
For both local execution and with an agent
When are datasets deleted if I run local execution?
When you hit the cache entry limit (100 if I recall). This can a...
Hi @<1597762318140182528:profile|EnchantingPenguin77>--ipc=host actually means that there is no need for the --shm-size argument, it means you have access to the enitre GPU ram on the host machine. I'm assuming that the GPU card just does not have enough VRAM ...
None
BTW: what happens if you pass the same s3://bucket to Task.init output_uri ? I assume you are getting the same access issue ?
Hi PungentLouse55
Are you referring to the example code ?
GiganticTurtle0 the fix was not applied in 1.1.2 (which was a hot fix after pyjwt interface changed and broke compatibility)
The type hint fix it on the latest RC:pip install clearml==1.1.3rc0I just verified with your example
apologies for the confusion, we will release 1.1.3 soon (we just need to make sure all tests pass with a few PRs that were merged)
So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)
If you use joblib.dump (which is like pickle but safer/faster) it will be auto logged
https://github.com/allegroai/clearml/blob/4945182fa449f8de58f2fc6d380918075eec5bcf/examples/frameworks/scikit-learn/sklearn_joblib_example.py#L28
Thanks @<1523701713440083968:profile|PanickyMoth78> for pining, let me check if I can find something in the commit log, I think there was a fix there...