It seems like the configuration is cached in a way even when you change the CLI parameters.
@<1523704461418041344:profile|EnormousCormorant39> nice!
Yes the configuration is cached so that after you set it once you can just call clearml-session again without all the arguments
What was the actual issue ? Should we add something to the printout?
I suppose the same would need to be done for any
client
PC running
clearml
such that you are submitting dataset upload jobs?
Correct
That is, the dataset is perhaps local to my laptop, or on a development VM that is not in the
clearml
system, but I from there I want to submit a copy of a dataset, then I would need to configure the storage section in the same way as well?
Correct
PompousBeetle71 is this ArgParser argument or a connected dictionary ?
i.e. runpip install --upgrade trains
PompousBeetle71 BTW: if you remove the type=str
from the argparse, it will do what you want, None will stay None (instead of ''), all other values will be of type str
as this is always the default 🙂
PompousBeetle71 quick question, will you ever want to pass an empty string ? reason for asking is that it is either one or the other, there is no way for Trains to actually differentiate (from the web UI, perspective this is just an empty string field...)
Hmm... That's what happens with the exception of None/'' if type is str... There is no way to differentiate in the UI.
This is why we opted for type=str
will "cast" everything to str so you always get str, while not specifying a type will leave the variable as is... If you have an idea on how to support both, feel free to suggest 🙂
... Would not work for huge llm style models.
yes I agree... but then if the model is small enough then you can just keep it in memory ...
I think so (you can also comment out the Task.init() just to verify this is not a clearml issue)
Hi @<1538330703932952576:profile|ThickSeaurchin47>
Specifically I’m getting the error “could not access credentials”
Put your minio credentials here:
None
with ?
multipart: false
secure: false
If so, can you post here your aws.s3 section of the clearml.conf? (of course replacing the actual sensitive information with *s)
Yes, only task.execute_remotely()
should be the last call. because it literally will stop the local run before you add the Args section
Hi JumpyDragonfly13
Let's assume we have two machines, one we call remote, one we call laptop (at least for this discussion)
On the Remote machine we need to run: (notice we must have docker preinstalled on the remote machine, it can work without docker, let me know if this is the case for you)clearml-agent daemon --queue interactive --create-queue --docker
On the Laptop we runclearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
What clearml-session will do is crea...
Our remote machine is Windows 10
JumpyDragonfly13 seems like the Windows 10 + docker is the issue (that would explain the OCI error)
Is this relevant ?
https://github.com/microsoft/WSL/issues/5100
ReassuredTiger98 when you look for task "dca2e3ded7fc4c28b342f912395ab9bc" there are no artifacts ?
Could you add some prints? this should have worked...
Hi ReassuredTiger98
Could you add some print ? before / after the artifact upload?
Also what's the clearml version you are using ?
ReassuredTiger98 I'm trying to debug what's going on, because it should have worked.
Regrading Prints ...
` from clearml import Task
from time import sleep
def main():
task = Task.init(project_name="test", task_name="test")
d = {"a": "1"}
print('uploading artifact')
task.upload_artifact("myArtifact", d)
print('done uploading artifact')
# not sure if this helps but it won'r hurt to debug
sleep(3.0)
if name == "main":
main() `
BattyLion34 let me see if I understand.
The same base_task_id when cloned by the UI and enqueues on the same queue as the pipeline, will work but when the pipeline runs the same Task it fails?!
Could it be that you enqueue them on different queues ?
That makes no sense to me?!
Are you absolutely sure the nntrain is executed on the same queue? (basically could it be that the nntraining is executed on a different queue in these two cases ?)
BattyLion34 is this consistent?
(Really I can't see eny difference, one time it is able to create the venv and another it is failing with permission error)
BattyLion34
Maybe something inside the task is different?!
Could you run these lines and send me the result:from clearml import Task print(Task.get_task(task_id='failing task id').export_task()) print(Task.get_task(task_id='working task id').export_task())
Hi BattyLion34
I might have a solution, in order to make sure the two agents are not sharing the "temp" folder:
create two copies of ~/clearml.conf , let's call them :
~/clearml_service.conf ~/clearml_agent.confThen in each one select a different venvs_dir
see here:
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L90
for example:
~/.clearml/venvs-builds1 ~/.clearml/venvs-builds2Now start the two agents with:
The service age...
BattyLion34 I have a theory, I think that any Task on the "default" queue qill fail if a Task is running on the "service" queue.
Could you create a toy Task that just print "." and sleeps for 5 seconds and then prints again.
Then while that Task is running, from the UI launch the Task that passed on the "default" queue. If my theory holds it should fail, then we will be getting somewhere 🙂
any idea why i cannot selected text inside the table?
Ichh, seems again like plotly 😞 I have to admit quite annoying to me as well ... I would vote here: None