Reputation
Badges 1
25 × Eureka!ValueError: Missing key and secret for S3 storage access
Yes that makes sense, I think we should make sure we do not suppress this warning it is too important.
Bottom line missing configuration section in your clearml.conf
- Artifacts and models will be uploaded to the output URI, debug images are uploaded to the default file server. It can be changed via the Logger.
- Hmm is this like a configuration file?
You can do.
local_text_file = task.connect_configuration('filenotingit.txt')
Then open the 'local_text_file' it will create a local copy of the data in runtime, and the content will be stored on the Task itself. - This is how the agent installs the python packages, but if the docker already contactains th...
Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder
Hi @<1523701304709353472:profile|OddShrimp85>
You mean something like clearml-serving ?
None
yes, looks like. Is it possible?
Sounds odd...
Whats the exact project/task name?
And what is the output_uri?
OutrageousGrasshopper93tensorflow-gpu is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?
Hi MassiveBat21
CLEARML_AGENT_GIT_USER is actually git personal token
The easiest is to have a read only user/token for all the projects.
Another option is to use the ClearML vault (unfortunately not part of the open source) to automatically take these configuration on a per user basis.
wdyt?
EnviousStarfish54 quick update, regardless of the logging.config.dictConfig issue, I will make sure that even when the logger is removed, the clearml logging will continue to function π
The commit will be synced after the weekend
Actually scikit implies joblib π (so you should use scikit, anyhow I'll make sure we add joblib as it is more explicit)
Hmm should not make a diff.
Could you verify it still doesn't work with TF 2.4 ?
I think you are correct π Let me make sure we add that (docstring and documentation)
hmmm, somehow I have a bed feeling about it... Could you check the log, it should say something like "Collecting torch==1.6.0.dev20200421+cu101 from https://"
It should be right at the top of the installation. What do you have there?
Hi EagerOtter28
I think the replacement should happen here:
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/clearml_agent/helper/repo.py#L277
Hi SuperficialGrasshopper36
You are diffidently onto a bug π
It seems that with the new poetry , we fail to set the target venv (basically it decides for itself), from that point, the execution f the actual code is not running inside the correct venv.
Could you please open a GitHub issue?
I want to make sure this will be addressed π
Anyhow if the StorageManager.upload was fast, the upload_artifact is calling that exact function. So I don't think we actually have an issue here. What do you think?
Hi @<1644147961996775424:profile|HurtStarfish47>
. I see
Add image.jpg
being printed for all my data items ...
I assume you forgot to call upload ? the sync "marks" files for uploaded / deletion but the upload call actually does the work,
Kind of like git add / push , if that makes sense ?
Hi ExcitedFish86
Good question, how do you "connect" the 3 nodes? (i.e. what the framework you are using)
Hi DilapidatedDucks58
trains-agent tries to resolvethe torch package based on the specific cuda version inside the docker (or on the host machine is if used in virtual-env mode). It seems to fail finding the specific version "torch==1.6.0.dev20200421+cu101"
I assume this version was automatically detected by trains when running manually. If this version came from a private artifactory you can add it to the trains.conf https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L...
Ohh, I see now, yes that should be fixed as well π
JitteryCoyote63 nice hack π
how come it is not automatically logged as console output ?
Hmm let me rerun (offline mode right ?)
Any plans to add unpublished stateΒ for clearml-serving?
Hmm OddShrimp85 do you mean like flag, not being served ?
Should we use archive ?
The publish state, basically locks the Task/Model so they are not to be changed, should we enable unlocking (i.e. un-publish), wdyt?
Yes, only task.execute_remotely() should be the last call. because it literally will stop the local run before you add the Args section
Thanks BroadSeaturtle49
I think I was able to locate the issue != breaks the pytroch lookup
I will make sure we fix asap and release an RC.
BTW: how come 0.13.x have No linux x64 support? and the same for 0.12.x
https://download.pytorch.org/whl/cu111/torch_stable.html
BTW: you can always set different config files by with an environment variable:CLEARML_CONFIG_FILE="path/to/cobfig/file
Hmm can you try:--args overrides="['log.clearml=True','train.epochs=200','clearml.save=True']"
neat! please update on your progress, maybe we should add an upgrade section once you have the details worked out
it seems like each task is setup to run on a single pod/node based on the attributes like
gpu memory
,
os
,
num of cores,
worker
BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...