Reputation
Badges 1
25 × Eureka!Hmm @<1523701083040387072:profile|UnevenDolphin73> I think this is the reason, None
and this means that even without a full lock file poetry can still build an environment
Three options:
In your code: Task.init(..., output_uri='s3://.../'
2. Configure a default output_uri to be used by all tasks: https://github.com/allegroai/clearml/blob/64042f6c4fdaaf15b6c5f816f2fbf50f89c313e2/docs/clearml.conf#L156
3. In the UI after you clone a Task under Execution tab, "output" "destination"
In all cases output_uri can be:
/mnt/share/folder (if you have a shared folder between all machines. http://trains-server:8081/ gs://bucket azure://bucket/
EnormousWorm79 you mean to get the DAG graph of the Dataset (like you see in the plots section)?
DilapidatedDucks58 no don't say that, you are wonderful π
trains-agent --gpus 0 --queue my_queue -d
should create a worker machine:gpu0
Then you can do trains-agent --gpus 1 --queue my_queue -d
which will create machine:gpu1
Hi @<1526371965655322624:profile|NuttyCamel41>
I think that the only way to actually get huge number of api calls is with a lot of machines.
For example, regardless of the amount of console-logs you print, it will only be a single call, as these are packages every 2-10 seconds. The same with metric reporting etc.
On the free tier you cal already test the amount of API calls, I think the mechanism is exactly the same
fyi: I would put this question in the channel
so you have a repo with poetry that some users update and some do not?
All working on the same branch ?
Hmmm maybeΒ
Β I thought that was expected behavior from poetry side actually
I think this is the expected behavior, hence bug?!
okay, let me see if I can nail down the issue
The issue is the 400 returned form the server, let me check with backend guys
Are you seeing the argparse arguments in the UI (when running locally) ?
might be my folder permissions hmm
That actually makes sense, also notice that if you are running under a diff user, the ~ (home folder) is different
Hi DeliciousBluewhale87
You mean per Task? Is it reporting? Is it like the project overview?
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
What do you mean by simulate?
You can manually setup and run a Task if you need,
'clearml-agent execute --id task_id' add --docker for docker mode.
This will setup the env and run the task
like what all are important metric monitoring queries w.r.t. the serving tasks that can be visualized and shown in grafana?
Basically latency amd requests per minute are automatically reported. Additional reports are based on your RestAPI in/out.
Imagine the following restapi request json payload
{x=123, y=456}
and a return json of
{z=789}
The metrics you can add to the monitoring are the keys on both these jsons, i.e. "x", "y", "z"
These metrics can be both log...
MysteriousBee56 what do you mean by "local repository"?
Like no git server, or local commit before pushing it ?
Well it is there, do you have it in your docker-compose as well?
https://github.com/allegroai/trains-server/blob/master/docker-compose.yml#L55
Thanks MortifiedDove27 ! Let me see if I can reproduce it, if I understand the difference, it's the Task.init in a nested function, is that it?
BTW what's the hydra version? Python, and OS?
Β are models technicallyΒ
Task
s and can they be treated as such? If not, how to delete a model permanently (both from the server and from AWS storage)?
When you call Task.delete() it actually goes over a;; the models/artifacts and deletes them from the storage
Hmm should not make a diff.
Could you verify it still doesn't work with TF 2.4 ?
Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all
It should print to console...print(task.get_output_log_web_page())
The api server by default spins multiple processes (they all might be busy a tye time with a huge flood of requests, but this is still multi process). Let me check if there is an easy way to set more processes
Sure, run:clearml-agent init
It is a CLI wizard to configure the initial configuration file.
Yup, I just wanted to mark it completed, honestly. But then when I run it, Colab crashes.
task.close()
will do that
BTW what's the exception you are getting ?
is it a shared network mount ? could you just delete the entire ~/.clearml on the host machine ?
Hmm, yes this fits the message. Which basically says that it gave up on analyzing the code because it run out of time. Is the execution very short? Or the repo very large?
Yes that's the reason, basically there is a background thread analyzing the code, at the end of the execution if it is till running (hence the question regrading execution time) we give it extra 10seconds to come up with answers, otherwise we terminate, so the code won't get stuck. Makes sense to you?