Thanks @<1547028074090991616:profile|ShaggySwan64> !!
Passing to the backend guys to take a look
BTW: how did it get there ?
PungentLouse55 , make sure you fix the metric objective and args:
Add "General/" prefix to the list of arguments to optimize, and change the objective metric from "Accuracy" to "epoch_accuracy"
Just to make sure, the first two steps are working ?
Maybe it has to do with the fact the "training" step specifies a docker image, could you try to remove it and check?
BTW: A few pointers
The return_values
is used to specify multiple returned objects stored individually, not the type of the object. If there is a single object, no need to specify
The parents
argument is optional, the pipeline components optimizes execution based on inputs, for example in your code, all pipeline comp...
Wait, how do I reproduce it on community server? Maybe it has something to do with number of columns ? Or whether it is already wider than the screen? What's your browser / OS ?
looks like a great idea, I'll make sure to pass it along and that someone reply 🙂
Could you extend on the use case of #18 ? how would you use it? what problem will it be solving ?
and then?
The thing is programmatically this is not easy to do as API, because at the end the "function" (i.e. LCI) never leaves, it connects to the SSH and stays
But you can query the Task it creates, the project is known, the user is known and it is of special type/tag
Hi JitteryCoyote63 ,
When you shutdown the task (manually with close() or when the process finish) it wait for the uploads...
Why do you need to specifically wait for all the artifacts upload? (currently you can stop the artifacts upload thread and wait for all the artifacts, but that seems like a bad hack)
but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
How so?
ShallowCat10 try something similar to this one, due notice that it might take a while to get all the task objects, so I would start with a single one 🙂
`
from trains import Task
tasks = Task.get_tasks(project_name='my_project')
for task in tasks:
scalars = task.get_reported_scalars()
for x, y in zip(scalars['title']['original_series']['x'], scalars['title']['original_series']['y']):
task.get_logger().report_scalar(title='title', series='new_series', value=y, iteration=...
is it possible to change an existing model's URL?
Edit the DBs ... That's basically the only way 😞
MysteriousBee56 I see...
So yes, you can with the APIClient you have full RESTful access to the backend.
I think there was a similar discussion https://allegroai-trains.slack.com/archives/CTK20V944/p1593524144116300
HandsomeCrow5 how did you end up solving it? I think you had a similar use case?!
trains-agent doesn't run the clone, it is pip...
basically calling "pip install git+https://..."
Not sure you can pass extra arguments
Also, this is not a setup problem, otherwise it would have seen consistently failing ... this actually looks like a network issue.
The only thing I can think of is retrying to install if we get network error (not sure whats the exit code of pip though (maybe 9?)
but DS in order for models to be uploaded,
you still have to set:
output_uri=True
in the
No, if you set the default_output_uri, there is no need to pass output_uri=True
in the Task.init()
🙂
It is basically setting it for you, make sense ?
I understand but how do you launch the cleaml-agent
itself:clearml-agent daemon --detached --queue default --docker
Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environmen...
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
Is it possible in Clearml to somehow allocate resources so that maybe after running a number of Alice's tasks, Bob's task get processed (Like maybe Round robin fashion)
Hi DeliciousBluewhale87
A few options here:
set the agent with high / low priority queues. Make sure Alice pushes into low priority (aka HPO) then Bob can push into high priority when he needs. This makes a lot of sense when you have automation processes spinning many experiments. expanding (1) you could set differe...
I see, good point. It does look like mostly boiler plate code, not sure where it actually runs the python command, but I'm sure it is there (python.ts, but could not locate who is actually using it)
DeliciousBluewhale87 Yes I think so, do notice that you might end up with maximum of 12 pods.
You can also do the following with max 10 nodes: (notice --queue can always get a list of nodes it will pull based on the order of the queues)python k8s_glue_example.py --queue high_priority_q low_priority_q --ports-mode --num-of-services 10
. Does
Task.connect
send each element of the dictionary as a separate api request? Has anyone else encountered this issue?
Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?
Hi PungentLouse55
Hope you are not tired of me
Lol 🙂 No worries
I am using trains 0.16.1
Are you referring to the trains-server version or the python package ? (they are not the same and can be of totally different versions)
Hi HealthyStarfish45
Funny just today I had a similar discussion on slurm:
https://allegroai-trains.slack.com/archives/CTK20V944/p1603794531453000
Anyhow, when you say "[scale up agents]" are you referring to a machine constantly running an agent pulling jobs from the queue, where the machine itself (aka the resource) is managed as a slurm job?