Reputation
Badges 1
25 × Eureka!t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:
2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads
And there's no sign of updates on the dashboard
Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.task.flush() sleep(45)
It seems like there is no way to define that a Task requires docker support from an agent, right?
Correct, basically the idea is you either have workers working in venv mode or docker.
If you have a mixture of the two, then you can have the venv agents pulling from one queue (say default_venv) and the docker mode agents pulling from a different queue (say default_docker). This way you always know what you are getting when you enqueue your Task
I think you are onto a good flow, quick iterations / discussions here, then if we need more support or an action-item then we can switch to GitHub. For example with feature requests we usually wait to see if different people find them useful, then we bump their priority internally, this is best done using GitHub Issues 🙂
it handles 2FA if my repo lies in Github and my account needs 2FA to sign in
It does not 😞
. Does
Task.connect
send each element of the dictionary as a separate api request? Has anyone else encountered this issue?
Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?
But there is no need for 2FA for cloning repo
Basically create a token and use it as user/password
EDIT:
With read-only permissions 🙂
Hi CloudySwallow27
how can I just "define" it on my local PC, but not actually run it.
You can use the clearml-task
CLI
https://clear.ml/docs/latest/docs/apps/clearml_task#how-does-clearml-task-work
Or you can add the following line in your code, that will cause the execution to stop, and to continue on a remote machine (basically creating the Task and pushing it into an execution queue, or just aborting it)task = Task.init(...) task.execute_remotely()
https://clear.ml/do...
Hi SarcasticSparrow10 , so yes it does, this is more efficient when using pytorch loaders, and in some other situations.
To disable it add to your clearml.conf:sdk.development.report_use_subprocess = false
2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)
SarcasticSparrow10 how do I reproduce it?
I tried launching from a sub process that is a daemon and it worked. Are you using ProcessPool ?
Yes it does. I'm assuming each job is launched using a multiprocessing.Pool (which translates into a sub process). Let me see if I can reproduce this behavior.
Okay, I was able to reproduce, this will only happen if you are running from a daemon process (like in the case of a process pool), Python is sometimes very picky when it comes to multi-threading/processes I'll check what we can do 🙂
SarcasticSparrow10 LOL there is a hack around it 🙂
Run your code with python -O
Which basically skips over all assertion checks
So what will you query ?
just want to be very precise an concise about them
Always appreciated 🙂
Guys FYI:params = task.get_parameters_as_dict()
Hi SmarmySeaurchin8 , you can point to any configuration file by setting the environment variable:TRAINS_CONFIG_FILE=/home/user/my_trains.conf
JitteryCoyote63 no I think this is all controlled from the python side.
Let me check something
I think the easiest way is to add another glue instance and connect it with CPU pods and the services queue. I have to admit that it has been a while since I looked at the chart but there should be a way to do that
Hi EnviousStarfish54
I remember this feature request, let me check where it stands..
That is odd ...
Could you open a GitHub issue?
Is this on any upload, how do I reproduce it ?
However, SNPE performs quantization with precompiled CLI binary instead of python library (which also needs to be installed). What would be the pipeline in this case?
I would imagine a container with preinstalled SNPE compiler / quantizer, and a python script triggering the process ?
one more question: in case of triggering the quantization process, will it be considered as separate task?
I think this makes sense, since you probably want a container with the SNE environment, m...
Hi TrickyRaccoon92
Yes please update me once you can, I would love to be able to reproduce the issue so we could fix for the next RC 🙂
Hmm worked now...
When Task.init called with output_uri='
s3://my_bucket/sub_folder '
s3://my_bucket/sub_folder/examples/upload issue.4c746400d4334ec7b389dd6232082313/artifacts/test/test.json
Hmm, let me check, there is a chance the level is dropped when manually reporting (it might be saved for internal critical reports). Regardless I can't see any reason we could not allow to control it.