basically @<1554638166823014400:profile|ExuberantBat24> you can think of hyper-datasets as a "feature-store for unstructured data"
can i run it on an agent that doesn't have gpu?
Sure this is fully supported
when i run clearml-serving it throughs me an error "please provide specific config.pbtxt definion"
Yes this is a small file that tells the Triton server how load the model:
Here is an example:
https://github.com/triton-inference-server/server/blob/main/docs/examples/model_repository/inception_graphdef/config.pbtxt
What do you mean by "modules first and find a way to install that package" ?
Are those modules already in wheels ? are they part a git repository?
(the pipeline component can also start inside a git repository it clones)
What I'm trying to do is to filter is between two datetimes...Is that possible?
could you expand ?
same: Not Found (#404)
May I suggest to DM it to me (so it is not public)
Sure LazyTurkey38 here's a nice hack for that:
` # code here
task.execute_remotely(queue_name=None, clone=False, exit_process=False)
patch the Task and actually send it for execution
if Task.running_locally():
task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}})
# now to actually enqueue the Task
Task.enqueue(task, queue_name='default') You can also clear the git diff by passing
"diff": "" `
wdyt?
Task.current_task().connect(training_args, name='hugggingface args')
And you should be able to change them when launching remotely π
SmallDeer34 btw: "set_parameters_as_dict" will replace all the arguments (and is one way) ...
Damn, JitteryCoyote63 seems like a bug in the backend, it will not allow you to change the task type to the new types π
Just dropping this here but I've had some funky compressions with very small datasets!
Odd deflate behavior ...?!
do you know how can i save all the logs and all the metric images?
These are stored into clearml-server, no? what am I missing ?
Hi ShakyJellyfish91
It seems clearml is using a single connection, that takes a long time download
Hmm, I found this one:
https://github.com/allegroai/clearml/blob/1cb5dbb276026644ae20fef63d58256cdc887818/clearml/storage/helper.py#L1763
Does max_connections=10
mean 10 concurrent connections ?
So in summary: subprocess calls appear to break clearML tracking, even if I do Task.init() in both main.py and train.py.
Okay let me see if we can reproduce & fix this, it should not be long
Hi @<1552101447716311040:profile|SteadySeahorse58>
ValueError: Could not find queue named "services"
Did you set an agent / auto-scaler ? where is the pipeline and its components will be running ?
Just to clarify, where do I run the second command?
Anywhere just open a python console and import the offline task:from trains import TaskTask.import_offline_session('./my_task_aaa.zip')
Related, how to I specify in my code the cache_dir where the zip is saved?
This is the Trains cache folder, you can set it in the trains.conf file:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/docs/trains.conf#L24
BroadMole98 as one can expect long answer as well π
I have a workflow with 19000 job nodes in it.
wow, 19k job nodes? as in a single pipeline 19k steps?
The main idea of the trains-agent is to allow multi-node workloads, and creating pipelines on top of a scheduler without worrying about docker packaging (done automatically for you), and to have a proper scheduler with priority (that is missing from k8s)
If the first step is just "logging" all the steps, you can easily add "Task...
Hi @<1541954607595393024:profile|BattyCrocodile47>
Did you check None ?
You are not supposed to do 2,3,4
After (1) you should just do
ssh root@localhost -p 8022
and provide the password that is written in the CLI
(Notice to pass --public-ip
if your remote machine is using a public IP you can access)
EnviousStarfish54 Yes i'm not sure what happens there we will have to dive deeper, but now that you got us a code snippet to reproduce the issue it should not be very complicated to fix (I hope π€ )
So far, i modified the code to set DOCKER_ROOT_CONF_FILE to what i want !!!
Interesting, do you think a PR is a good next step ? how one would configure it?
corporate firewall... let's start with http π
Yes it seems so π
Hi GrittyCormorant73
When I archive the pipeline and go into the archive and delete the pipeline, the artifacts are not deleted.
Which clearml-server version are you using? The artifact delete was only recently added
I would ideally just want to have NVIDIA drivers and Docker on the on-prem nodes (along with the clearML agents). Would that allow me to get by with basic job scheduling/queues through clearML?
Yes this is fully supported and very easy to setup.
Regrading limiting users usage. This is doable, I think the easiest solution both for users and management of the cluster is introducing priority into the queue, basically a user can push job into low priority, and only some users can push into high...
In that case I suggest you turn on the venv cache, it will accelerate the conda environment building because it will cache the entire conda env.
I think that the first model saved gets the task name as its name and the following models take
f"{task_name} - {file_name}"
Hmm, I'm not sure what would be a good way to make it consistent, would it make sense to always have the model file name?
I guess it takes some time before the the correct names are assigned?
Hmm that is odd, I have a feeling it has to do with calling Task.close()?!
I just tried with the latest clearml version and it seemed to work as expected
How do you run theΒ
clearml-agent
Β in docker mode
clearml-agent --docker
See here:
https://clear.ml/docs/latest/docs/clearml_agent#docker-mode
Thank you EnviousStarfish54 !
This is very helpful!
I'm looking at Kedro and the project you shared, and a few thoughts came to mind:
I very much like the idea of using functions as "nodes" (and to extend, using notebook cells with tags as nodes). This got me thinking, I'm pretty sure we could have a similar imlmentation with ClearML. My thinking is using inspect
or dill
to convert the functions/cells into plain text code, automatically analyze the runtime requirements, and creat...
It will also allow you to pass them to Hydra (wither as overloaded, or directly edit the entire hydra config)