GiddyTurkey39 what do you have in the Task itself
(i.e. git repo uncommitted changes installed packages)
Hmm, is there a way to do this via code?
Yes, clone the Task Task.clone
Then do data=task.export_task()
and edit the data object (see execution section)
Then update back with task.update_task(data)
Hi @<1529633468214939648:profile|CostlyElephant1>
what seems to be the issue? I could not locate anything in the log
"Environment setup completed successfully
Starting Task Execution:"
Do you mean it takes a long time to setup the environment inside the container?
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL,
It seems to be working, as you can see no virtual environment is created, the only thing that is installed is the cleartml-agent that i...
Hi VexedCat68
Check this example:
https://github.com/allegroai/clearml/blob/4f9aaa69ed2d5b8ea68ebee5508610d0b1935d5f/examples/scheduler/trigger_example.py#L44
Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environmen...
Here this new entry in the log is 2 min after env completed =>
1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415
This seems to be something in your code, just add print("starting") in your entry python file, Before any imports (because they might actually do something)
Because form the agent's perspective after printing Starting Task Execution:
it literally calls the python script, nothing else...
Hmm so the SaaS service ? and when you delete (not archive) a Task it does not ask for S3 credentials when you select delete artifacts ?
So I checked the code, and the Pipeline constructor internally calls Task.init, that means that after you constructs the pipeline object, Task.current_task() should return a valid object....
let me know what you find out
TrickyRaccoon92 actually Click is on the to do list as well ...
OddAlligator72 what you are saying is, take the repository / packages from the runtime, aka the python code calling the "Task.create(start_task_func)" ?
Is that correct ?
BTW: notice that the execution itself will be launched on other remote machines, not on this local machine
Task.completed(ignore_errors=False)
What are you getting?
OddAlligator72 I like this idea.
The single thing I'm not sure about is the "function entry point"
Why would one do that? Meaning why wouldn't you have a proper python entry-point.
The reason I'm reluctant is that you might have calls/functions/variables in global scope of the file storing the function, and then users will not know why something broke, ans it will be very cumbersome to debug.
A simple script entry point seems trivial to launch and debug locally.
What do you think ? What woul...
You might be able to also find out exactly what needs to be pickled using the
f_code
of the function (but that's limited to C implementation of python).
Nice!
pywin32 isnt in my requirements file,
CloudySwallow27 whats the OS/env ?
(pywin32 is not in the direct requirements of the agent)
AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_DEFAULT_REGION
yes, or (because I deployed clearml using helm in kubernetes) from the same machine, but multiple pods (tasks).
Oh now I see, long story short, no 😞 the correct way of doing that is every node/pod creates it's own dataset,
then when you are done, you create a new version with the X datasets that you created as parents, the newly created version is just "meta" it basically tells the system how to combine the previously generated datasets (i.e. no data is actually re-uploa...
EnviousStarfish54 are those scalars reported ?
If they are, you can just do:task_reporting = Task.init(project_name='project', task_name='report') tasks = Task.get_tasks(project_name='project', task_name='partial_task_name_here') for t in tasks: t.get_last_scalar_metrics() task_reporting.get_logger().report_something
the latter is an ec2 instance
and the agent fails to install on the ec2 machine ?
GiddyTurkey39 do you have an experiment with the jupyter notebook ?
Are you seeing the entire jupyter notebook in the "uncommitted changes" section
I've seen that the file location of a task is saved
What do you mean by that? is it the execution section "entry point" ?
That is exactly that, the trains-agent is replicating the code from the git repo, and trying to apply the git diff (see uncommitted changes section). Obviously it failed 🙂
Hi @<1545216070686609408:profile|EnthusiasticCow4>
will ClearML remove the corresponding folders and files on S3?
Yes and it will ask you for credentials as well. I think there is a way to configure it so that the backend has access to it (somehow) but this breaks the "federated" approach
What's the clearml-server version ?
Then what happens is that
Task.current_task()
returns
None
for the pipeline's task...
Hmm that sounds like the pipeline Task was closed?! could that be? where (in the code) is the call to Task.current_task ?
SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?
Hi ProudMosquito87
My apologies there is still no concrete ETA ...
That said I think a good toy example would really help accelerate this process.
How about opening a PR with a nice hydra example, then we can start discussing implementation details based on the toy example ?
Hi WackyRabbit7
So I'm assuming after the start_locally
is called ?
Which clearml version are you using ?
(just making sure, calling Task.current_task()
before starting the pipeline returns the correct Task?)
Just making sure, after the pipe
object is created, you can call Task.current_task() , is that correct?